6

I am writing some code to parse through the MFT on disk in NTFS volumes. This is straightforward, but one particular corner case caught my eye, and I can't find a clear answer anywhere on the internet.

For normal files in NTFS it is possible to have multiple MFT records for a single file, if the file has more attributes than can fit in a single record (for example, many $FILE_NAME attributes if the file has many hard-links, or many $DATA attributes if it has many Alternate Data Streams).

The $MFT file at reference-number 0 holds the data runs for the MFT itself. Normally it is a single record with no children. Is it possible for the $MFT file to have child records? If it were possible, how would you know where to find them? Would those child records have to be stored with very low reference numbers so that you could reliably get to them without having to have parsed the $MFT already to know where they were on disk?

DSII
  • 429
  • 6
  • 15

3 Answers3

6

There is a special type of attribute called $ATTRIBUTE_LIST. A file or directory can have up to 65536 attributes and they can't possibly fit into a single MFT entry. It basically contains a list of all the file's attributes except himself. Each entry in the list contains the attribute type and the MFT reference of where to find the attribute. That's what the base file reference field in the file record header is for.

If the list gets too big for a MFT entry, the attribute can become non-resident and the list will be found by interpreting the data run of the attribute.

Because the type of the $ATTRIBUTE_LIST is 32, it's placed usually right after the $STANDARD_INFORMATION attribute and will contain attributes with greater types (like $FILE_NAME or $DATA).

When a file becomes very fragmented, the $DATA attribute run list will not fit in a single MFT entry. This is also a case where $ATTRIBUTE_LIST will be used to store the $DATA attribute in multiple entries.

The $MFT entry rarely has this problem since the allocation alogrithm is designed to prevent that. But if a $MFT for a volume becomes very fragmented it can have more than one entry to store it's $DATA.

  • Thank you very much! I know about attribute lists, but I still don't understand how this could work for the $MFT. Suppose you have a large fragmented $MFT file with very many $DATA attributes, listed in the attribute list. The attribute list would normally contain the child record reference number. But how would you know where to find that child record? Normally you would parse the MFT to find the right run on disk. But you're trying to get the MFT's data itself, so you can't parse the MFT yet to find the location of the MFT's own data! – DSII Jun 05 '15 at 20:34
  • The only way out is if the child record must have a small enough reference number that it will always live in the first MFT run, or if the MFT record cannot have children. – DSII Jun 05 '15 at 20:34
  • @DSII I couldn't find any references in either the Linux NTFS docs, Brian Carrier's forensics book or MSDN docs for this. I don't think this ever happens since the `$DATA` attribute in one entry could describe the next run of the `$MFT` for the other entries. So you would have to both parse the entries and the data run for them at the same time, so that you can find them by reference in runs you already know about. This is a very extreme and unlikely case. The most fragments for a `$MFT` on a 5-year old drive were 3. – Sebastian-Laurenţiu Plesciuc Jun 06 '15 at 11:13
  • @DSII also, some documents mention that the `$MFT` is found after the first few clusters of the volume and has an allocation delta equal to half the size of the volume. And basically that delta halves when clusters meant for file contents get filled. So in the end the `$MFT` shouldn't be fragmented. However I tested this on Windows 8 and I found files that have contents indexed to clusters that precede the `$MFT`. But, on a Windows 7 machine and a Vista machine the delta thing seems to be happening. Something changed in the NTFS.sys but I don't know what. – Sebastian-Laurenţiu Plesciuc Jun 06 '15 at 11:16
  • I agree that it's an unlikely case, but was an interesting one, so I thought I'd ask. (Also, it would break my parser if it were ever possible!) I agree that you're right that any $DATA attributes (even more than one) in the base record will give you enough information. However, if there are so many $DATA attributes that they have to be outsourced to an attribute list, I still don't see a way out here. The only possible conclusion is that they go to great extremes to ensure the MFT does not fragment this much. – DSII Jun 06 '15 at 12:12
1

tl;dr: Yes; I believe this is what ERROR_DISK_TOO_FRAGMENTED/STATUS_MFT_TOO_FRAGMENTED are for.

To elaborate:

The MFT file can most certainly have child records. If you need to construct one like this, just open \$MFT (do it on a RAM disk unless you want to mess up a physical volume...) and then FSCTL_MOVE_FILE each entry, alternating between the beginning and the end of the volume. You'll severely fragment the MFT and cause it to generate an $ATTRIBUTE_LIST, such that it won't even fit into the last 4 of the 16 initial records anymore. It'll overflow into later slots.

Logic dictates, however, that the MFT needs to be bootstrappable. Thus I can only conclude each child described by an $ATTRIBUTE_LIST entry must be in a slot from a previous extent. As such, it is possible to run into a situation where the volume has enough free space to grow the MFT, but no free slots to describe the MFT's next extent. I think this is one situation where the driver would return STATUS_MFT_TOO_FRAGMENTED.

Good luck writing an efficient parser for this, it's rather tedious.

(n.b. It's possible but harder to fragment the $ATTRIBUTE_LIST itself too. But I read that its run-list must fit within a single record, so this imposes a hard limit on the number of fragments.)

user541686
  • 205,094
  • 128
  • 528
  • 886
0

Yes, for sure.

I ended up on a disk drive with a 18 Gb $MFT file. Because Windows splits the $MFT file in 200 Mb chunk, there was not enough space left in the $MFT file record #1 to store all the dataruns inside a $Data attribute.

I found them in file record #15 (one of the "unused" file records if you read all the documentation about NTFS on internet). This file record was containing a header and a single $Data (0x80) attribute with the 104 (!) dataruns.

$MFTBrowser of file record #15 for a 18 Gb $MFT file

Tristan CHARBONNIER
  • 1,119
  • 16
  • 12