Introduction to design of file-systems
Table of Contents
1 Introduction
This document contains information related to design of file-systems. This is created to help as supporting slides or manual for a guest lecture to explain how file systems are designed.
2 Linux basics
For various demos we will be using a Linux machine. Some Linux basics are required to understand demos properly.
2.1 Device names
Linux operating systems have devices with names /dev/sda, /dev/sdb, etc. for serial block storage devices.
During the lecture we will use a pen drive (the smaller the better) for demo. The internal hard-disk of demo laptop is /dev/sda. The pen drive will automatically become /dev/sdb. Hence most of our demo will use device /dev/sdb
Please note that Linux has special virtual filesystems /sys, /dev and /proc. These folders do not contain real file or folders. These folders contain devices or special files which allow us to get or even change state of various systems (Power, Processes, Underlying hardware, etc.)
2.2 dd command
dd command can read blocks of data from a input device and write those blocks to output device. Simple dd usage is:
dd if=<input-file-or-device> <output-file-or-device>
This way we can use dd to copy a file a.txt to b.txt using:
dd if=a.txt of=b.txt
We can also use dd to copy all contents of a device (say /dev/sdb pen drive in our case) to a file
dd if=/dev/sdb of=pen-drive-image01.raw
This is useful is creating iso images from CD/DVDs. If CD/DVD device name is /dev/sr0, then we can create CD/DVD iso image using:
dd if=/dev/sr0 of=dvd.iso
2.3 /dev/zero device
There is special device /dev/zero. We can read infinite number of zeros from this device. This is useful for writing zeros to a file or device (in our case 1GB pen drive).
We need this as normal file-system creation or file deletion does not zeros all inode or data blocks (Will learn more about them in this lecture). Hence we can zero the device and then start our experiments to have a clean device without any junk/random values to work with.
2.4 xxd command
xxd command can read a normal file or a device and give its hexdump. We can use -s option to specify seek (From where the reading should start). We can use -l (length) to specify number of bytes to read.
2.5 Pipe operator (|)
We can use pipe operator (|) to take output from one command and give it to other command as input
2.6 more
We can use more to see information one page at a time.
3 Filesystems basics
We will first look at an a simple file-system (FAT). It has various variants 12-bit, 16-bit, Fat32, vfat, exfat etc. We will look at Fat32 file-systems with VFAT extension for purpose of our discussion to avoid confusion. This is the most common fat filesystem in use for small (<32 GB) size disks.
3.1 File Allocation Table
Similar to the way index or contents are used to store information about chapters or topics of book. FAT stores information about which files and directories are present and where is the data stored in these structures. Actual data is stored in data region and pointed by entries in FAT.
Note that FAT word is overloaded. Name of file-system is also FAT and the table that stores cluster allocation information for files and directories is also called FAT.
3.2 Directory
Note that as far as filesystem is considered directory is a special file which contains list of other files. Hence if a Directory test contains file a.txt and b.txt. That means there is a special file called test with directory flag set, with its data having names and information about a.txt, b.txt.
Here directory flag for test would be set in its inode. a.txt and b.txt information would be stored in data blocks.
3.3 Slack space
Fat32 filesystem for partitions <8GB, typically uses default cluster size of 4KB.
For example if we store 100 bytes of data in a file, then the corresponding data block with only have 100 bytes of data. Remaining 3996 bytes of space is left unused. We cannot use this unused space for any other purpose. Ideally this space should contain only zeros(0). But after using file-system for long time it is likely to have considerable junk information for previous files in the slack space area.
Space left after useful information before end of current 4096 bytes block is called slack space.
4 Partition table types
There are two types of partition tables:
- MSDOS
- This is the older partition table type used by Windows XP
and before OS's. This has limit of maximum 2TB disk
size. Further this has two types of partitions - Primary
and extended. There can be maximum 4 primary partitions.
If we need more partitions then one of the four
partitions is created not as primary but as extended
partition. We can have unlimited logical partitions
inside an extended partition.
We can read MSDOS partitions used
fdisk
command. - GPT
- These are new types of partition tables in use by modern
OS by default. For working with GPT file-systems we must
use
parted
command.
Since we will be dealing with small pen drives as part of the lecture demos, we will restrict ourselves to use of older msdos partition tables only.
5 Design and structure of FAT32 file system
Fat32 filesystem has following components:
- In first 512 bytes indicating various parameters size, type, volume label, volume ID, etc. of file-system.
- Starting at second sector for quick values on next free data cluster and no. of free clusters.
- Remaining reserved sectors
- File Allocation Table(s)
- Data Region
5.1 Boot Sector
- First three bytes are jump instruction - a short jump followed by a NOP (opstring sequence 0xEB 0x?? 0x90) or a near jump (0xE9 0x?? 0x??)
- Next eight bytes are OEM name
- FAT32 Extended BIOS Parameter Block (60 or 79 bytes). This extended BPB contains DOS 3.31BPB which in turns contains DOS 2.0 BPB. Values in these structures are described ahead.
- 2 bytes for bytes per logical sector. Typically 512.
- 1 bytes for logical sectors per cluster. Typically 8 for 4kb block.
- 2 byte for count of reserved logical sectors. Typically 32 for FAT32.
- 1 byte for number of file-allocation tables. Typically 2.
- 2 bytes for number of root directory entries. In Fat32 this is 0 as root directory is also stored in data region.
- 2 bytes for number of logical sectors. If zero use 4-byte value at later stage. (Point 15)
- 1 byte for media descriptor. 0xf8 for fixed disks.
- 2 bytes for logical sectors per fat. 0 for FAT32 as fat32 uses a 4-byte value described later. (Point 16)
- 2 bytes for physical sectors per track for CHS addressing.
- 2 bytes for no. of heads for CHS addressing
- 4 bytes of count of hidden sectors before FAT partition
- 4 bytes of total logical sectors instead of 16-bit value at point 9
- 4 bytes of logical sectors per fat. (Use this in case of fat32 instead of value at point 11)
- 2 bytes of drive description
- 2 byte version. Should be 0.0
- Cluster no. of root directory start. Typically 2.
- 2 byte for sector no. for FS information sector. Typically 1.
- 2 byte for first logical sector number of a copy of three FAT32 boot sectors. Typically 6.
- 12 reserved bytes. Should be 0.
- Ignore next 2 bytes
- 1 byte value of 0x29 to indicate FAT32 extended boot signature
- 4 bytes of Volume ID
- 11 bytes of Volume Label
- 8 bytes of FS type
5.2 FS Information Sector
- Four bytes signature (0x52 0x52 0x61 0x41 = "RRaA")
- Next 480 bytes are reserved (0x00)
- Four bytes signature (0x72 0x72 0x41 0x61 = "rrAa")
- Four bytes of last known number of free data clusters
- Four bytes of number of most recently allocated data cluster. 0xffffffff after format. Should start from 0x02 onwards.
- Another 12 reserved bytes to be set to zero
- Four bytes of signature (0x00 0x00 0x55 0xAA)
5.3 File allocation table.
FAT32 file allocation table has 32 bit address for each entry in little endian order. First four bits of this entry are reserved, should be set to 0 during formatting and should not be changed otherwise. (First four bits of fourth byte from left to right).
Each 32-bit value can be one of the following:
- Decimal 0 to indicate cluster is unused
- Decimal 1 to indicate special reserved cluster
- Cluster number of next cluster for this entry (0200 0000 to efff ff0f) (Read as 0000 0002 to 0fff ffef)
- Reserved (0fff ff0f to f6ff ff0f (Read as 0fff fff0 to 0fff fff6)
- Special values to mark bad cluster (f7ffff0f) (Read as 0fff fff7)
- Special value to mark end of cluster (f8ff ff0f to ffff ff0f) (Read as 0fff fff8 to 0fff ffff)
5.4 Data region
Data region has either file or directory contents. File contents are straight forward. Length of file comes from directory listing. Further next cluster location of files longer than 4096 bytes comes from FAT. Thus there is easy direct way to read files from data region.
Directories on the other hand contain considerable information. They use following structure:
- First 8 bytes is short file name padded with spaces.
- If filename starts with 0x00 that means the entry is unused and
there is nothing used after this.
- If filename starts with 0xe5 that means the file has been deleted.
- Next 3 bytes are short file extension
- Next byte is file attribute
Bit Significance location & mask 1 (0x80) Reserved 2 (0x40) Device 3 (0x20) Archive 4 (0x10) Sub-directory 5 (0x08) Volume label 6 (0x04) System 7 (0x02) Hidden 8 (0x01) Readonly - 1 byte for indicating whether EAS (Extended attributes are used or not)
- 1 byte for 10ms time from 1 to 199
- 2 bytes for creation time as follows
Bits Description 15-11 Hours(0-23) 10-5 Minutes(0-59) 4-0 Seconds (0-29) - 2 bytes for creation date as follows:
Bits Description 15-9 Year(0=1989,max 127=2107) 8-5 Month(1-12) 4-0 Day(1-31) - 2 bytes for last access date
- High 2 bytes of first cluster number in FAT32h
- 2 bytes for last modified time
- 2 bytes for last modified date
- Low 2 bytes of first cluster number in FAT32
- 4 bytes file size
5.5 Long filename entries
If file attribute is 0x0f then it indicates VFAT long filename, which has following structure:
Locate | Size | Description |
---|---|---|
(in bytes) | ||
0x00 | 1 | Sequence Number (bit 6: last logical, first physical LFN entry, |
bit 5: 0; bits 4-0: number 0x01..0x14 (0x1F), deleted entry: 0xE5) | ||
0x01 | 10 | Name characters (five UCS-2 characters) |
0x0B | 1 | Attributes (always 0x0F) |
0x0C | 1 | Type (always 0x00 for VFAT LFN) |
0x0D | 1 | Checksum of DOS file name |
0x0E | 12 | Name characters (six UCS-2 characters) |
0x1A | 2 | First cluster (always 0x0000) |
0x1C | 4 | Name characters (two UCS-2 characters) |
6 Demo - Creating fat32 file system on a pen drive
Following steps can be used to understand little bit about fat32 file-systems:
- Connect the pen drive to system.
- Use
fdisk -l
to see that device has been detected. See the overall size (4GB, 8GB, etc.) as secondary verification. - Zero the device to be used for experiments using:
dd if=/dev/zero of=/dev/sdb #If you get impatient use "sudo killall -9 dd; sync" to terminate dd command
Use this only if the pen drive has no useful information and pen drive device is indeed /dev/sdb. Many people have lost all there data by using this type of system command incorrectly.
- Check that at least initial blocks have been zeroed
xxd /dev/sdb | more
Example output
0000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- Create a single primary partition with fat label on the device
fdisk /dev/sdb n p 1 p t 1 b p w fdisk -l
Here:
n
is to create a new partitionp
for indicating that new partition should be a primary partition1
for creating first primary partition- two enter for default start and end sector numbers for the new first primary partition being created
t
for changing file-system type1
for changing file-system type for 1st primary partitionb
for setting new file-system type to be fat32p
for printing new partition table after creating a new partition and changing its file-system typew
for saving and quitingfdisk -l
for printing new partition table on all devices. Note that it would understand only msdos partition tables.
Example input/output:
[root@barjatiyarklp ~]# fdisk /dev/sdb Welcome to fdisk (util-linux 2.23.2). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Device does not contain a recognized partition table Building a new DOS disklabel with disk identifier 0xd0bfdbcc. Command (m for help): n Partition type: p primary (0 primary, 0 extended, 4 free) e extended Select (default p): p Partition number (1-4, default 1): 1 First sector (2048-7821311, default 2048): Using default value 2048 Last sector, +sectors or +size{K,M,G} (2048-7821311, default 7821311): Using default value 7821311 Partition 1 of type Linux and of size 3.7 GiB is set Command (m for help): p Disk /dev/sdb: 4004 MB, 4004511744 bytes, 7821312 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0xd0bfdbcc Device Boot Start End Blocks Id System /dev/sdb1 2048 7821311 3909632 83 Linux Command (m for help): t Selected partition 1 Hex code (type L to list all codes): b WARNING: If you have created or modified any DOS 6.xpartitions, please see the fdisk manual page for additionalinformation. Changed type of partition 'Linux' to 'W95 FAT32' Command (m for help): p Disk /dev/sdb: 4004 MB, 4004511744 bytes, 7821312 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0xd0bfdbcc Device Boot Start End Blocks Id System /dev/sdb1 2048 7821311 3909632 b W95 FAT32 Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@barjatiyarklp ~]# fdisk -l Disk /dev/sda: 500.1 GB, 500107862016 bytes, 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk label type: dos Disk identifier: 0x34a0032f Device Boot Start End Blocks Id System /dev/sda1 * 2048 206847 102400 7 HPFS/NTFS/exFAT /dev/sda2 206848 184322047 92057600 7 HPFS/NTFS/exFAT /dev/sda3 184322048 368642047 92160000 7 HPFS/NTFS/exFAT /dev/sda4 368642048 976773167 304065560 5 Extended /dev/sda5 368644096 471044095 51200000 83 Linux /dev/sda6 471046144 503814143 16384000 82 Linux swap / Solaris /dev/sda7 503816192 976773119 236478464 83 Linux Disk /dev/sdb: 4004 MB, 4004511744 bytes, 7821312 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0xd0bfdbcc Device Boot Start End Blocks Id System /dev/sdb1 2048 7821311 3909632 b W95 FAT32
- First partition information would start at bytes 446 (1be in
hex) and would contain 16 bytes of information.
- 1 bit of 1 byte indicates whether partition is bootable. Other bits of 1 byte are unused.
- Next 3 bytes are for CHS address of first absolute sector (1byte head,(9-8) 2 high bit cylinder, 6 low bits for sector, 8 remaining bits for cylinder)
- Then 1 byte of partition type
- Next 3 bytes are for CHS address of last absolute sector
- Then 4 bytes of LBA of first absolute sector in partition
- Then 4 bytes of number of sectors in partition
Note that with 512 bytes per sectore this limits size of partition to 2TB and size of disk to 4TB Example partition 1 information:
00001b0: 0000 0000 0000 0000 ccdb bfd0 0000 0021 ...............! 00001c0: 0300 0b2a ccf9 0008 0000 0050 7700 0000 ...*.......Pw...
Here, we can see at
- 1be -> 00 Since partition is not bootable
- 1bf-1c2 -> 0x210300 CHS address of first sector of partition,
should be ignored. 0x21 is 33 in Decimal.
- (Read later) With 62 (0x3e) sectors per track. We get 33*62 = 2046. 2046+3 = 2049 is the first sector of partition.
- 1c3 -> 0b Partition type for fat32 we wrote using fdisk
- 1c4-1c6 -> 2accf9 CHS address of last sector of partition, should be ignored
- 1c7-1ca -> LBA address of start of partition -> 0000 0800 (Read right to left due to enddianness) = 2048 in decimal
- 1cb-1ce -> Length of partition -> 00775000 = 7819264 in base 10. This is same as 7821311-2048+1.
- Since we have choosen 2048 sector as first sector of first
primary partition, we can read values of this partition using:
xxd -s +1048576 /dev/sdb | more
Here, 1048576 is 2048 * 512 (0010 0000 in hex). Example output would be:
[root@barjatiyarklp ~]# xxd -s +1048576 /dev/sdb | more 0100000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100100: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100110: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100120: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100130: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100140: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100150: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100160: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100170: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100180: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100190: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- Creating fat filesystem in first primary partition using:
mkfs.fat /dev/sdb1
- Again look at first primary partition data using:
xxd -s +1048576 /dev/sdb | more
Here, 1048576 is 2048*512
Example output would be:
0100000: eb58 906d 6b66 732e 6661 7400 0208 2000 .X.mkfs.fat... . 0100010: 0200 0000 00f8 0000 3e00 7c00 0000 0000 ........>.|..... 0100020: 0050 7700 c61d 0000 0000 0000 0200 0000 .Pw............. 0100030: 0100 0600 0000 0000 0000 0000 0000 0000 ................ 0100040: 0000 2992 685c 474e 4f20 4e41 4d45 2020 ..).h\GNO NAME 0100050: 2020 4641 5433 3220 2020 0e1f be77 7cac FAT32 ...w|. 0100060: 22c0 740b 56b4 0ebb 0700 cd10 5eeb f032 ".t.V.......^..2 0100070: e4cd 16cd 19eb fe54 6869 7320 6973 206e .......This is n 0100080: 6f74 2061 2062 6f6f 7461 626c 6520 6469 ot a bootable di 0100090: 736b 2e20 2050 6c65 6173 6520 696e 7365 sk. Please inse 01000a0: 7274 2061 2062 6f6f 7461 626c 6520 666c rt a bootable fl 01000b0: 6f70 7079 2061 6e64 0d0a 7072 6573 7320 oppy and..press 01000c0: 616e 7920 6b65 7920 746f 2074 7279 2061 any key to try a 01000d0: 6761 696e 202e 2e2e 200d 0a00 0000 0000 gain ... ....... 01000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100100: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100110: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100120: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100130: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100140: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100150: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100160: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100170: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100180: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100190: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01001f0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U.
Here, we can see that
Address Value Description range 00-02 0xeb5890 Short jump ahead additional 0x58 (88 in decimal) followed by NOP 03-0a mkfs.fat OEM Name Then 79 bytes of Fat32 Ext. BPB of which first 25 bytes are DOS 3.31 BPB. DOS 3.31 BPB first 13 bytes are DOS 2.0BPB. 0b-0c 0x0200 Number of bytes per logical sector (512 in decimal) (From DOS 2.0BPB) 0d 0x08 Logical sectors per cluster 0e-0f 0x0020 Number of reserved logical sectors (32 in decimal) 10 0x02 Number of file allocation tables 11-12 0x0000 Number of root directory entries 13-14 0x0000 Number of logical sectors (16-bit value) 15 0xf8 Media descriptor 16-17 0x0000 Number of logical sectors per fat (16-bit value) 18-19 0x003e Physical sectors per track. (62 in decimal) 1a-1b 0x007c Number of heads. (124 in decimal) 1c-1f 0 Number of hidden sectors before FAT partition 20-23 0x775000 Total logical sectors (7819264 in decimal) 24-27 0x1dc6 Logical sectors per fat (7622 in decimal) 28-29 0x0000 Drive description 2a-2b 0x0000 Version (0.0) 2c-2f 2 Cluster no. of root directory start 30-31 0x0001 Logical sector no. of FS Information Sector 32-33 0x0006 First logical sector no. of a copy of three FAT32 boot sectors 34-3f Reserved should be (zero) 40 Ignore 41 Ignore 42 0x29 0x29 indicates extended boot signature 43-46 Volume ID (0x92685c47) 47-51 Volume Label (NO NAME) 52-59 Filesystem type (FAT32) - Look at FS Information Sector using
xxd -s +1049088 /dev/sdb | more
Here 1049088 is 2048*512+512
The values are:
0100200: 5252 6141 0000 0000 0000 0000 0000 0000 RRaA............ 0100210: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100220: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100230: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100240: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100250: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100260: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100270: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100280: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100290: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01002a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01002b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01002c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01002d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01002e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01002f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100300: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100310: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100320: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100330: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100340: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100350: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100360: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100370: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100380: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0100390: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01003a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01003b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01003c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01003d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01003e0: 0000 0000 7272 4161 89e2 0e00 0200 0000 ....rrAa........ 01003f0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U.
Here, various signatures are matching. Here Last known number of free data clusters is 0x000ee289 which is 975,497 in decimal. Each cluster has 8 sectors so 7,803,976 sectors. Length of partition is 7,819,264 leaving 15288 sectors for Boot sector, FS information block, reserved, etc. purposes.
Number of most recently used data cluster is set to 0x02 from where root directory should start.
- Look at first fat using
xxd -s +1064960 /dev/sdb | more
Here 1064960 is 2048*512+32*512 where 32 sectors were reserved.
Note that 975,497 free clusters were mentioned in FS Information sector. These many clusters required 3,901,988 bytes of space assuming 4byte per cluster for FAT. That means almost 7622 sectors for one FAT. This value matches what we read in table before at addresses 24-27, hex value 0x1dc6 for logical sectors per fat (7622 in decimal). Since we keep two FATs we have 15244 sectors just for storing FAT.
Total sectors for FAT, reserved etc. after removing data clusters was 15288. 15288(Total)-15244(FAT*2)-32(Reserved)=12. Thus, only four 14 sectors are unused. Minimum 8 are required to make one more usable cluster.
Example output is:
0104000: f8ff ff0f ffff ff0f f8ff ff0f 0000 0000 ................ 0104010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- Lets look at first data region
xxd -s +8869888 -l 10000 /dev/sdb | less
2048(Partition offset) +32 (Reserved) +15244 (FAT) = 17324. 17324*512=8869888.
Should be all 0 as we do not have any files or directories inside root directory either.
- Mount the partition on a test folder
mkdir /mnt/test mount /dev/sdb1 /mnt/test
- Create a file a.txt with word hello
echo "hello" > /mnt/test/a.txt
- Umount the pen drive
umount /mnt/test
- Again look at first fatdata region:
xxd -s +1064960 /dev/sdb | more
The fat table might look like:
0104000: f8ff ff0f ffff ff0f f8ff ff0f ffff ff0f ................ 0104010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
This indicates that root directory at cluster 2 ends at cluster 2 and that there is something else at cluster 3 that also ends at cluster 3. Other clusters look unused.
Look at data region using:
xxd -s +8869888 -l 10000 /dev/sdb | less
The directory entry might look like:
0875800: 4161 002e 0074 0078 0074 000f 005d 0000 Aa...t.x.t...].. 0875810: ffff ffff ffff ffff ffff 0000 ffff ffff ................ 0875820: 4120 2020 2020 2020 5458 5420 0064 94b5 A TXT .d.. 0875830: 544b 544b 0000 94b5 544b 0300 0600 0000 TKTK....TK...... 0875840: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0875850: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Here, we have two directory entries
- First entry file attribute at byte 12 is 0x0f. Hence it is a
VFAT long file name.
- Byte 1 contains 0x41 indicating that this is the first entry (Sequence no. 1) and also that this is the last entry (Sixth bit 0x40 is set)
- Byte 2-0a contain first five characters of filename in 2-byte character (UCS-2) format. (a.txt)
- Byte 0b is 0x0f
- Byte 0c is 00
- Byte 0d is checksum
- Byte 0e onwards for 12 bytes there are remaining 6 UCS-2 characters. Here first character is 0x0000 and all others are 0xffff means that file name is already completed by before bytes.
- Byte 1a onwards for 2 bytes indicates next cluster 0x0000
- Byte 1c onwards has 2 UCS-2 characters taking 4 additional bytes.
- Second entry file attribute at byte 12 is 20. This
indicates that it is a normal file and not a directory.
Further archive bit is set indicating that file has been
modified since last backup.
- Byte 1 contains name A followed by 7 spaces for another 7 bytes
- Byte 8-0a contain extension TXT
- Byte 0b contains 0x20 file attribute
- Byte 0c contains 0x00 indicating EAS extension are not used
- Byte 0d contains 0x64 indicating 100*10ms or 1 second.
- Byte 0e-0f contain 0xb594 (1011 0101 1001 0100) indicating:
- 10110 Hours = 22
- 101100 Minutes = 44
- 10100 Seconds = 20 (40 seconds)
- Byte 10-11 contain 0x4b54 (0100 1011 0101 0100) indicating:
- 0100101 Years = 37 (1980+37=2017)
- 1010 Month = 10
- 10100 Day = 20
- Byte 12-13 contain 0x4b54 indicating 20-10-2017 as last
access date
- Byte 14-15 contain high two bytes of first cluster 0x0000
- Byte 16-17 contain last modification time 0xb594 which is 22:44:40
- Byte 18-19 contain last modification date 0x4b54 which is 20-10-2017
- Byte 1a-1b contain low two bytes of file cluster 0x0003 which indicates cluster no. 3
- Bytes 1c-1f contain 0x06 which indicates file size of 6 bytes
We can verify that date, time and file size were read correctly using:
[root@barjatiyarklp ~]# mount /dev/sdb1 /mnt/test/ [root@barjatiyarklp ~]# ls -l /mnt/test/ --full-time -c total 4 -rwxr-xr-x 1 root root 6 2017-10-20 22:44:41.000000000 +0530 a.txt [root@barjatiyarklp ~]# ls -l /mnt/test/ --full-time htotal 4 -rwxr-xr-x 1 root root 6 2017-10-20 22:44:40.000000000 +0530 a.txt [root@barjatiyarklp ~]# umount /mnt/test/
Further, if try to look at third cluster (2048+32+15244+(3-2)*8)=17332. 17332*512=8873984:
xxd -s +8873984 /dev/sdb | more
You should get below output:
0876800: 6865 6c6c 6f0a 0000 0000 0000 0000 0000 hello........... 0876810: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- First entry file attribute at byte 12 is 0x0f. Hence it is a
VFAT long file name.
- If we delete the file and try to read data region then
mount /dev/sdb1 /mnt/test rm -f /mnt/test/a.txt umount /mnt/test
xxd -s +1064960 /dev/sdb | more
Fat will look like:
0104000: f8ff ff0f ffff ff0f f8ff ff0f 0000 0000 ................ 0104010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Second cluster (root directory) data region has:
xxd -s +8869888 -l 10000 /dev/sdb | less
0875800: e561 002e 0074 0078 0074 000f 005d 0000 .a...t.x.t...].. 0875810: ffff ffff ffff ffff ffff 0000 ffff ffff ................ 0875820: e520 2020 2020 2020 5458 5420 0064 94b5 . TXT .d.. 0875830: 544b 544b 0000 94b5 544b 0300 0600 0000 TKTK....TK...... 0875840: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Here filenames starting with 0xe5 indicate that entry has been deleted
Third cluster value is:
xxd -s +8873984 /dev/sdb | more
0876800: 6865 6c6c 6f0a 0000 0000 0000 0000 0000 hello........... 0876810: 0000 0000 0000 0000 0000 0000 0000 0000 ................
7 Take aways so far
- 32-bit length of partition in msdos partition table with 512 byte sector limits partition size to 2TB
- 32-bit file-size in FAT32 limits filesize to 4GB
- When a file is deleted in fat entire contents are saved. Directory information of file simple indicates file is deleted. Thus, restoring file on FAT is possible, provided the same has not been overwritten.
- We can learn to read FAT filesystem directory at binary level with some effort
- Files on FAT filesystem may not necessarily get stored contiguously. This should be remembered when conducting forensic analysis sector by sector (or cluster by cluster)
- During forensics if you can determine cluster size (4kb, 8kb,
etc.). Then we can search for filetype specific magic start value
near cluster start without looking at entire cluster data.
Try:
less /usr/share/misc/magic file /usr/share/pixmaps/fedora-logo.png
8 Ext3 file-system design
It is not possible to go to bit level design of various aspects of mature file-systems such as ext3 in a single session. Interested readers can use references to get internal details of ext3 file-system. As part of this session we will only look at high-level design
8.1 Ext3 Disk inodes
An ext3 file-system is inode based. An ext3 disk inode stores following information:
- File owner and group owner information (uid, gid)
- File type - There are types other than files, directories such as pipes and sockets
- Permissions (rwx for user owner, group owner and others)
- Access, modification and creation timestamps
- Number of links to the file
- 12 (Index 0:11) direct data block addresses, 1 indirect data block address, 1 double-indirect data block address and 1 triple indirect data block address
Each disk inode is fixed 128 bytes in size. Note that in-memory inodes also have additional information such as no. of open handlers to file, which are not described above.
Please refer to https://en.wikipedia.org/wiki/Ext2 for understanding block address pointers.
8.2 Directory
An ext3 directory contains following information
- Inode number
- Directory entry length (Typically mutiple of 4)
- Filename length
- File type
- File name. If filename is not exact multiple of 4 it might get padded with 0x00 (NULL) on the right till the directory entry length is achieved.
8.3 Differences between ext3 and fat
- In case of FAT directory entry contains timestamps, file length, etc. In case of ext3 directory entry contains only two things - Inode number and file name. Rest all information is stored in inodes.
- In FAT information about next cluster in data region in file is stored in FAT. In ext3 inode contains pointers (direct, indirect, double indirect, etc.) to data blocks used by particular file.
- We can decide number of inodes while creating the file-system. It is not fixed upon block size or partition size parameters.
- In ext3 it is possible to have hard-links which is not possible in FAT. For hard-links two different directory entries with different file-names will point to the same inode. That is why inode needs to keep track of number of entries pointing to it.
- In ext3 it is possible to have sparse files. This is possible by having previous data block address as 0, indicating entire data-block of zeros (Typically 4k).
9 NTFS file-system design
Similar to ext3 we will only have a high-level design overview of NTFS file-system.
NTFS uses Master File table to store filesystem information. In case of MFT small files and directories (Smaller than 512 bytes) can be stored directly inside MFT without requiring to refer to a data block. This can save considerable wastage of space as slack space.
Other interesting things about NTFS file system are:
- NTFS file system supports compressed and encrypted files.
- In case NTFS we can give different types of permission to a file
for each user or group (SID). For each SID we can given
allow/deny along with type of access read, write, full, etc.
This is different from ext3 where without POSIX ACLs typically
only user owner and group owner permission can be given.
There is also provision to inherit ACLs from parent folders. There are separate set of ACLs for auditing.
- NTFS uses transaction journal to keep record of changes being done to file-system. If a change is completed properly the same is recorded. Else during hard reboot NTFS can see actions under progress which did not complete properly and abort them, ensuring that file-system is always sane.
- NTFS similar to FAT can get fragmented and requires defragmentation to improve its speed. In contrast ext3 algorithms are designed to avoid fragmentation during allocation of disk blocks.
10 References
Following books and links were used or would be useful in gaining further insight into file-systems:
- Design of the Unix Operating System book by Maurice Bach.
- https://en.wikipedia.org/wiki/Master_boot_record - Information about MBR to understand location of various primary partitions in case of MSDOS partition table
- https://en.wikipedia.org/wiki/Cylinder-head-sector - To understand CHS structure
- https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system Design of fat filesystem
- http://www.nongnu.org/ext2-doc/ext2.html - Detailed ext2 file system design. Note that ext3 is backward compatible to ext2 with difference of journal.
- http://www.cs.uml.edu/~bill/cs516/ext2_3_filesystems_slides.pdf - Presentation on high level ext2 and ext3 file-system designs
- http://www.ntfs.com/ntfs_basics.htm - Detailed design of NTFS file system