Windows INDX Slack Parser

(version 0.14)

WARNING:
Do NOT modify this file in any way OR rename the file.
Doing so will invalidate the binary authentication.



END USER LICENSE AGREEMENT (EULA)
for software on TZWorks, LLC Website www.tzworks.net

1. USER AGREEMENT

Permission to use the TZWorks, LLC website ("Website") and downloadable software that is made available on the Website ("Software") is for non-commercial, personal use ONLY. The User Agreement, Disclaimer, Website and/or Software may change from time to time. By continuing to use the Website or Software after those changes become effective, you agree to be bound by all such changes. Permission to use the Website and Software is granted provided that (1) use of such Website and Software is for non-commercial, personal use only and (2) the Website and Software is not resold, transferred or distributed to any other person or entity. To use the Software for commercial or business purposes, a separate license is required. Contact TZWorks, LLC (jon@tzworks.net) for more information regarding licensing. To redistribute the Software, approval in writing is required from TZWorks, LLC. These terms do not give the user any rights in intellectual property or technology, but only a limited right to use the Software for non-commercial, personal use. TZWorks, LLC retains all rights to ownership of all software and content ever made available on its Website.

DISCLAIMER

The user agrees that all Software made available on the Website is experimental in nature and use of Website and Software is at user's sole risk. The Software could include technical inaccuracies or errors. Changes are periodically added to the information herein, and TZWorks, LLC may make improvements and/or changes to Software at any time. TZWorks, LLC makes no representations about the accuracy or usability of the Software and/or Website for any purpose.

ALL SOFTWARE ARE PROVIDED "AS IS" AND "WHERE IS" WITHOUT WARRANTY OF ANY KIND INCLUDING ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL TZWORKS, LLC BE LIABLE FOR ANY KIND OF DAMAGE RESULTING FROM ANY CAUSE OR REASON, ARISING OUT OF IT IN CONNECTION WITH THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THIS WEBSITE.

PRIVACY POLICY

When you use the Website or download the Software, we automatically record information regarding your activity using the Website. This may include the Internet Protocol ("IP") address and date and time stamps associated with transactions. Personal information and identifiable data is only collected if you supply it to TZWorks, LLC by posting a comment on the Website or submitting an email. We do not disclose any personally identifiable information without your permission unless we are legally entitled or required to do so or if we believe that such action is necessary to protect and/or defend our rights, property or personal safety and those of our users/customers etc.

SECURITY

The Website has security measures in place to protect the loss, misuse, and/or alteration of information under our control. The data resides behind a firewall, with access restricted to authorized TZWorks, LLC personnel. If you believe the Website and or its software has been misused or has had a security breach please email jon@tzworks.net. We will not be responsible for such misuse and not guarantee that we will rectify any such security breach.

REMOVAL

The Website and Software are the original works of TZWorks, LLC. However, to be in compliance with the Digital Millennium Copyright Act of 1998 ("DMCA") we agree to investigate and disable any material for infringement of copyright. Contact TZWorks, LLC at email address: jon@tzworks.net, regarding any DMCA concerns.


About the 'wisp' Tool (top)

wisp is a prototype version of a Windows parser that targets NTFS index type attributes. The NTFS index attribute points to one or more INDX records. These records contain index entries that are used to account for each item in a directory. An index item represents either a file or a subdirectory and includes enough metadata to contain the name, modified/access/MFT changed/birth (MACB) timestamps, size (if it is a file vice subdirectory), as well as MFT entry numbers of the item and its parent. The wisp tool, in its simplest form, is able to walk these structures, read the metadata, and report which index entries are present.

As a directory's contents are changed, the number of valid index entries grows or shrinks, as appropriate. As more directory entries are added, eventually it will exceed the existing INDX record allocation space. At this point the operating system will allocate an additional INDX record in the size of 0x1000 byte chunk. Conversely, when entries are removed from the directory, the INDX record space is not necessarily deallocated. Thus, anytime the number of index entries shrinks, the invalid ones potentially can be harvested from the slack space. The slack space is defined to be the allocated but unused space. By comparing both the valid entries and those still in the slack space, one can make some inferences about whether a file (or subdirectory) was present in the past.

A good tutorial on harvesting index entries from INDX slack space can be found on Willi Ballenthin's webpage [4] and his DFIRonline presentation [5].

wisp uses the NTFS and index attribute parsing engine that is used in the ntfswalk tool [6] available from the TZWorks LLC website. Currently there are compiled versions for Windows, Linux and Mac OS-X.


How to use this Tool (top)

While the wisp tool doesn't require one to run with administrator privileges, without doing so will restrict one to only looking at off-line 'dd' images. Therefore to perform live processing of volumes, one needs to launch the command prompt with administrator privileges.

One can display the menu options by typing in the executable name with no parameters. From the available options, one can process NTFS INDX records with a handful of 'use-cases'. Specifically, wisp allows processing from any of these sources: (a) live volume, (b) 'dd' type image, (c) VMWare volume or (d) separately extracted INDX type record.

After selecting the source of the data, one can either: (a) process a single directory on the file system, (b) recursively process the subdirectories to some specified level, or (c) process all the index entries in an entire volume. Processing every directory in the entire volume is not explicitly shown in the above menu since it is the default option.

If one only wants a certain type of index entry, one can select: (a) just show valid index entries, (b) just show index entries in the slack space, or (c) both. For default output, one can select nothing. This will output the data in unstructured text. If parsable output is desired (or something that can be displayed in a spreadsheet application), one can select from 3 options that allow for structured output (CSV, log2timeline CSV, or SleuthKit body-file). The other useful option is the 'no duplicates' choice to minimize any redundancy in the output.


Parsing a Live Volume (top)

To parse INDX entries from a live NTFS volume (or partition), one has two choices: (a) specify the volume directly by using the –partition <drive letter> option or (b) specify the drive number and volume offset by using the –drivenum <num> –offset <volume offset> option. Either choice accomplishes the same task. The first choice is more straightforward and easier to use. The second choice, while more complex, allows one to target hidden NTFS partitions that do not have a drive letter.

The next step is to decide what you want to target. The choices are: (a) a specific directory on the file system (specified by either the –mft or –path options), (b) a collection of subdirectories within a directory (how deep you wish to go is specified by the –level option) or (c) all directories (specified by the –scanall option). A couple examples are shown below:

    wisp –partition c –path c:\$Recycle.Bin –level 2 –all –csv > results1.csv
    wisp –drivenum 0 –offset 0x100000 –all –csv > results2.csv

The first example targets the hidden directory of c:\$Recycle.Bin, and the –level 2 switch tells wisp to include any subdirectory in the analysis, up to 2 levels deep. The –all switch means both valid and invalid (slack) entries will be included in the output. Finally, the output is redirected to a file and the format is CSV.

The second example uses the same output options as the first, but now targets the first physical hard drive. The hex value 0x100000 is specified as the offset to the volume (or partition) we wish to analyze. For this example, this happens to be the hidden partition created during a Windows 7 installation. Since there is no –mft or –path options explicitly listed, the implication to wisp is we want to traverse the entire volume parsing all INDX records associated with the volume.


Parsing an Image File off-line (top)

To process an image that has been already acquired and is in the 'dd' format, one uses the –image switch. This option can be used in two flavors. If the image is of an entire drive then one needs to explicitly specify the offset of the location of the volume you wish to target. On the other hand, if the image is only of a volume, then you do not need to specify the offset of the volume (since it is presumed to be at offset 0).

For the first case, where an offset needs to be explicitly specified, wisp will help the user in locating where the NTFS volume offsets are. If one just issues the –image command without the offset, and there is not a NTFS volume at offset 0 (eg. second case mentioned above), wisp will proceed to look at the master boot record contained in the image, determine where the NTFS partitions are, and report them to the user. This behavior was meant to be an aid to the user so that one does not need to resort to other tools to determine where the offsets for the NTFS volumes are in an image.

Another nuance with using images as the source, is when specifying a path to a directory within the image to analyze, using the –path option. Since the image is not mounted as a drive, one really should not associate it with a drive letter when specifying the path. If one does do this, wisp will ignore the drive letter and proceed to try to find the path starting at the root directory which is at MFT entry 5 for NTFS volumes.

Below are two examples of processing 'dd' type images: (a) the first analyzes an entire volume at drive offset 0x100000 hex and (b) the second analyzes an image of a volume starting at the path "Users".

    wisp –image c:\dump\my_image.dd –offset 0x100000 –all –csv > results1.csv
    wisp –image c:\dump\vol_image.dd –path "\Users" –level 5 –all –csv > results2.csv

While the first example traverses the entire volume, the second starts at the "Users" directory and recursively processes the subdirectories up to 5 levels deep. Notice the second example does not specify an offset, since the image is of a volume (meaning the volume starts at offset 0) while the first is an image of a drive and the first NTFS volume starts at offset 0x100000 hex.

Both examples extract valid and invalid index entries as well as redirect their output to a file using CSV formatting.


Parsing a NTFS Volume Mounted on Linux or Mac OS-X (top)

Sometimes you do not have a 'dd' image of a volume or drive, but instead have the physical hard drive available you wish to analyze. If you are running wisp in Windows, then one can mount the physical dirve and proceed to follow the guidelines in the earlier section for parsing a live volume. However, if you are running wisp in Linux or Mac OS-X, you should also be able to mount the target drive as well. Once it is successfully mounted, one uses the –image <device name of drive or volume> –offset <offset to desired volume, if a drive> option to access the appropriate NTFS volume. Below is an example of how to do this using a Mac box.

Assuming one has the proper setup with write blocker and hard drive shuttle, after connecting the Windows drive to the Mac, one can issue the diskutil list command to enumerate all the drives and volumes mounted on the machine. For this example, lets also assume the drive that we mounted was labeled as /dev/disk1 and its NTFS partition was /dev/disk1s1. From this data one could issue the following command to wisp to analyze the partition.

    sudo wisp –image /dev/rdisk1s1 –all –csv –nodups > out.csv
Notice the 'sudo' in front of the wisp command. This will allow wisp to run with administrator privileges to access the raw drive. Also note /dev/rdisk1s1 is used. The 'r' (which is unique to Mac, in this case) is used to specify we want to access the drive as raw I/O as opposed to buffered I/O. Buffered I/O is nice for normal reads and/or writes, but it is much slower when traversing in chunks aligned on sector boundaries.

Linux is similar to Mac, but instead of using the diskutil tool, one would use the df tool to enumerate the mounted devices.


Parsing a VMWare volume (top)

Occasionally it is useful to analyze a VMWare image containing a Windows volume, both from a forensics standpoint as well as from a testing standpoint. This option is still considered experimental since it has only been tested on a handful of configurations. Furthermore, this option is limited to monolithic type VMWare images versus split images. In VMWare, the term split image means the volume is separated into multiple files, while the term monolithic virtual disk is defined to be a virtual disk where everything is kept in one file. There may be more than one VMDK file in a monolithic architecture, where each monolithic VMDK file would represent a separate snapshot. More information about the monolithic virtual disk architecture can be obtained from the VMWare website [10].

When working with virtual machines, the capability to handle snapshot images is important. Thus, if processing a VMWare snapshot, one needs to include the snapshot/image as well as its inheritance chain.

To handle an inheritance chain, wisp can handle multiple VMDK files by using multiple vmdk <file> switches, where each one represents a segment in the inheritance chain of VMDK files (eg. –vmdk <VMWare NTFS virtual disk-1> ... –vmdk <VMWare NTFS virtual disk-x>).

Aside from the VMDK inheritance chain, everything else is the same when using this option to that of normal 'dd' type images discussed in the previous section.


Extracting Clusters Associated with a Deleted Entry (top)

At this point in the analysis, one may want to go deeper and try to find if the deleted file is still available by trying to find the 'cluster run' data associate with the MFT entry. Since the INDX records do not have any cluster run data associated with an index entry, one would need to use the MFT entry specified and then use some other tool to read the file record associated with that MFT entry. One could extract the data either from the local volume or from the volume shadow copy store. If pulling from the local volume, one can use the ntfscopy utility [7] from TZWorks. This tool will allow one to (a) input a volume (live or off-line), (b) specify the desired MFT entry to copy from and (c) output the extract the data associated with the MFT's cluster run as well as the metadata associated with that MFT entry. Below is an example of doing this with ntfscopy using the MFT entry number 645130, which is for the slack entry shown above.

    ntfscopy –mft 645130 –dst c:\dump\645130.bin –partition c: –meta

For details on the ntfscopy syntax, refer to the ntfscopy readme file [7]. Briefly, the –mft option allows one to specify a source MFT entry to copy from. The –meta option says to create a separate file (in addition to the copied file) that contains the metadata information about the specified MFT entry. The metadata file will be created with the same name as the destination file with the appended suffix .meta.txt. Included in the metadata file are many of the NTFS attributes of the target source file (or MFT entry). This includes, amongst other things, the cluster run and MACB timestamps. From the metadata one can see if the MFT sequence number is the same or not (which would be the indication whether the MFT record was assigned to another file or not).


Eliminating the Duplicates (top)

For those INDX records that have many slack entries, it is not uncommon for there to be quite a few duplicate entries that are parsed and displayed in the output. Duplicate here means the filename and MACB timestamps are the same, however the location of the entry in the INDX record is different. For every unique location in the INDX record, wisp will happily parse the index entry and report its findings to the investigator. This can be quite annoying when some entries have more than a few duplicates and one is trying to wade through a lot of data; especially when carving out entries from slack data on all the directories in an entire volume.

To get rid of duplicates, one can invoke the –nodups switch. This tells wisp to analyze the data extracted and only report one instance of the entry. One thing to be aware of when using this option, is that wisp will internally always analyze valid and slack entries independent of what the user selects as input options. After all the data is extracted, wisp will start deciding if there are duplicate entries or not. It does this by looking at comparing all slack entries with valid entries to see if there is a duplicate, and if not, they are compared to any slack entries that have been marked as non-dups. What this means is, if one runs wisp and only wants non-duplicate slack entries, some slack entries will not be reported if there are valid entries present that are the same.


Output Options (top)

Which Output option yields the most data

The two output options that give all the metadata available are the default (unstructured) output and the CSV (–csv) output. The other two output options (–csvl2t and –bodyfile) are geared toward generating cross-artifact timelines. As a result, they are more restrictive in the output fields, and therefore, the metadata that is parsable from these options have some limitations. Specifically, wisp makes use of a couple of free-form fields in these last two output options to try to inject as much useful data as possible, but it makes the data in these fields unstructured, and therefore, difficult to parse if trying to post process the results. Therefore, the best option for metadata that needs to be parsed from wisp output comes from the –csv option.

Two Categories of Slack entries

The slack entries in the output have comments associated with them. The comments come in 2 categories: (a) entries that have not been deleted and (b) those that have been deleted.

The first case there is a valid and invalid (slack) entry pointing to the same file (one can tell this since the MFT entry number and sequence numbers match up). The difference, aside from the fact one is in slack space, is something has changed in the metadata from one file to the other. This could be that MACB timestamps are different, or the the size of the file has changed. wisp annotates the time modifications with a comment, denoted by [macb] and/or [size], depending whether the MACB times are different or the size is different. In the case of timestamps changing, wisp may put the following: [m.c.], which translates to the modify timestamp and 'MFT change' timestamps, respectively, as being different than the valid entry. From this, one can get some past data on a file that has gone through some revisions.

For the second case, wisp just annotates the slack entry as deleted. The term deleted here is only accurate in the sense that this index entry is no longer part of the directory containing the INDX record(s). Whether the item was moved to another subdirectory or actually deleted is unknown from the data presented here.

Corruption of the Index Entry

In cases where there the slack data is obviously corrupted, wisp will either leave that field blank if using CSV output or annotate the word <corrupted> if using the default output.


Known Issues (top)

For CSV (comma separated values) output, there are restrictions in the characters that are outputted. Since commas are used as a separator, any data that had comma in its name are changed to spaces. For the default (non-csv) output no changes are made to the data.


Available Options (top)

  1. The –image < name of image | device name> [–offset <volume offset>] combination of switches are used to process an image. The image can be either of a volume or an entire drive consisting of separate volumes. Only in the latter case is the –offset <volume offset> required. The volume offset can be specified in either base10 or base16 (hexadecimal). If using base16, prepend a 0x in front of the number and wisp will interpret it properly (eg. 0x100000).
  2. (Windows only option). The –partition <drive letter> is used to process a live volume. The drive letter option should be used for Windows and the device name option should be used for Linux.
  3. (Windows only option). The –drivenum <drive #> [–offset <volume offset>] combination of switches are used to process a drive. The –offset <volume offset> needs to be specified to identify which volume to analyze. The volume offset can be specified in either base10 or base16 (hexadecimal). If using base16, prepend a 0x in front of the number and wisp will interpret it properly (eg. 0x100000 or 1048576).
  4. The –vmdk <VMWare disk file> option is used to process a VMWare monolithic type volume. This option can be used repeatedly to specific a number of VMWare disk files to handle the case where there are delta snapshot disk images.
  5. The –indxfile <datafile name> is used to process INDX data that may be been extracted from another tool.
  6. If one knows the MFT entry of a directory, one can the –mft <MFT entry to analyze> option to extract index metadata.
  7. To analyze the index entries given a path, one uses the –path <directory to analyze>;.
  8. The –valid switch says to only extract the metadata from the valid index entries
  9. The –slack switch says to only extract the metadata from the slack index entries.
  10. The –all switch says to pull every index entry found. This includes both valid and invalid (slack) index entries.
  11. The –csv option outputs the data fields delimited by commas. Since record data can have commas, to ensure the fields are uniquely separated, any commas in the data get converted to semicolons.
  12. The –csvl2t option outputs the data fields in accordance with the log2timeline tool. This output needs to be independently verified for correctness.
  13. The –bodyfile option outputs the data fields in accordance with the body-file version3 specified in the SleuthKit. The date/timestamp outputted to the body-file are in terms of UTC. So if using the body-file in conjunction with the mactime.pl utility, one needs to set the environment variable TZ=UTC.
  14. The –base10 switch is to ensure all output is displayed in base-10 format vice hexadecimal format. Hexadecimal is only used for data that has a size artifact associated with the metadata
  15. The –nodups switch tells wisp only to output unique entries. Duplicate entries are flagged if the name and MACB timestamps are the same.
  16. The –username <name to use> option tells wisp to use this name as the username to populate with the output records.
  17. The –hostname <name to use> option switch tells wisp to use this name as the hostname to populate with the output records.

Authentication and License File (top)

wisp has authentication built into the binary. There are two authentication mechanisms: (a) the digital certificate embedded into the binary and (b) the runtime authentication. For the first method, only the Windows and Mac OS-X versions have been signed by an X-509 digital code signing certificate, which is validated by Windows (or OS-X) during operation. If the binary has been tampered with, the digital certificate will be invalidated.

For the second (runtime authentication) method, the authentication does two things: (i) validates that wisp has a valid license and (ii) validates the wisp's binary has not been corrupted. The license needs to be in the same directory of the wisp binary. Furthermore any modification to the license, either to its name or contents, will invalidate the license. The runtime binary validation hashes the executable that is running and fails the authentication if it detects any modifications.


Version history (top)


References (top)

  1. http://www.ntfs.com website
  2. http://en.wikipedia.org/wiki/NTFS website
  3. Brian Carrier's book, File System Forensic Analysis, sections on NTFS
  4. Willi Ballenthin article on NTFS INDX parsing.
  5. Getting to know your NTFS INDX Records presented May 3, 2012 on DFIRonline by Willi Ballenthin
  6. ntfswalk tool. http://tzworks.net/prototype_page.php?proto_id=12, TZWorks LLC.
  7. ntfscopy tool. http://tzworks.net/prototype_page.php?proto_id=9, TZWorks LLC.
  8. SleuthKit Body-file format, http://wiki.sleuthkit.org
  9. Log2timeline CSV format, http://log2timeline.net/
  10. VMWare Virtual Disk Format 1.1 Technical Note, http://www.vmware.com

Copyright © TZWorks, LLC, All Rights Reserved
Contact Info: jon@tzworks.net


LICENSE_AUTHENTICATION LkE+I3onEglrPHK/Y/205xbfsmbX64CdjuJgGB9PAwO985yzTjr0kRgrNy32CAkEUVMyMpf9T3bX M/KWGioc04izjVkjXjX5KLjGjat4Y9A/fTSSqz+tSUlRgInElEZmhp+Hcv3iNBcAJYQYnCoblNsK dPb/erBz3Q25MLo5y7YSYFuX1nsdnwsCb3OnW8eijqFERCHC/twYwAW9dGkYuj4vhXaUgLYELa+G 5KhASUyUpfO5DroXPE8/f+cpw4jCNf+DPt/3P7KaM3sJqCtsz+48104UIveIZtO4QDeph2Ah7WHx EY9v0z1xMt4xKIWJOfm4hkP614U/c7XduxuBOQ==