The FireEye Labs Advanced Reverse Engineering (FLARE) Team continues to share knowledge and tools with the community. We started this blog series with a script for Automatic Recovery of Constructed Strings in Malware. As always, you can download these scripts at the following location: https://github.com/fireeye/flare-ida. We hope you find all these scripts as useful as we do.
During my summer internship with the FLARE team, my goal was to develop IDAPython plug-ins that speed up the reverse engineering workflow in IDA Pro. While analyzing malware samples with the team, I realized that a lot of time is spent looking up information about functions, arguments, and constants at the Microsoft Developer Network (MSDN) website. Frequently switching to the developer documentation can interrupt the reverse engineering process, so we thought about ways to integrate MSDN information into IDA Pro automatically. In this blog post we will release a script that does just that, and we will show you how to use it.
The MSDN Annotations plug-in integrates information about functions, arguments and return values into IDA Pro’s disassembly listing in the form of IDA comments. This allows the information to be integrated as seamlessly as possible. Additionally, the plug-in is able to automatically rename constants, which further speeds up the analyst workflow. The plug-in relies on an offline XML database file, which is generated from Microsoft’s documentation and IDA type library files.
Table 1 shows what benefit the plug-in provides to an analyst. On the left you can see IDA Pro’s standard disassembly: seven arguments get pushed onto the stack and then the CreateFileA function is called. Normally an analyst would have to look up function, argument and possibly constant descriptions in the documentation to understand what this code snippet is trying to accomplish. To obtain readable constant values, an analyst would be required to research the respective argument, import the corresponding standard enumeration into IDA and then manually rename each value. The right side of Table 1 shows the result of executing our plug-in showing the support it offers to an analyst.
The most obvious change is that constants are renamed automatically. In this example, 40000000h was automatically converted to GENERIC_WRITE. Additionally, each function argument is renamed to a unique name, so the corresponding description can be added to the disassembly.
Table 1: Automatic labelling of standard symbolic constants
In Figure 1 you can see how the plug-in enables you to display function, argument, and constant information right within the disassembly. The top image shows how hovering over the CreateFileA function displays a short description and the return value. In the middle image, hovering over the hTemplateFile argument displays the corresponding description. And in the bottom image, you can see how hovering over dwShareMode, the automatically renamed constant displays descriptive information.
Figure 1: Hovering function names, arguments and constants displays the respective descriptions
How it works
Before the plug-in makes any changes to the disassembly, it creates a backup of the current IDA database file (IDB). This file gets stored in the same directory as the current database and can be used to revert to the previous markup in case you do not like the changes or something goes wrong.
The plug-in is designed to run once on a sample before you start your analysis. It relies on an offline database generated from the MSDN documentation and IDA Pro type library (TIL) files. For every function reference in the import table, the plug-in annotates the function’s description and return value, adds argument descriptions, and renames constants. An example of an annotated import table is depicted in Figure 2. It shows how a descriptive comment is added to each API function call. In order to identify addresses of instructions that position arguments prior to a function call, the plug-in relies on IDA Pro’s markup.
Figure 2: Annotated import table
Figure 3 shows the additional .msdn segment the plug-in creates in order to store argument descriptions. This only impacts the IDA database file and does not modify the original binary.
Figure 3: The additional segment added to the IDA database
The .msdn segment stores the argument descriptions as shown in Figure 4. The unique argument names and their descriptive comments are sequentially added to the segment.
Figure 4: Names and comments inserted for argument descriptions
To allow the user to see constant descriptions by hovering over constants in the disassembly, the plug-in imports IDA Pro’s relevant standard enumeration and adds descriptive comments to the enumeration members. Figure 5 shows this for the MACRO_CREATE enumeration, which stores constants passed as dwCreationDisposition to CreateFileA.
Figure 5: Descriptions added to the constant enumeration members
Preparing the MSDN database file
The plug-in’s graphical interface requires you to have the QT framework and Python scripting installed. This is included with the IDA Pro 6.6 release. You can also set it up for IDA 6.5 as described here (http://www.hexblog.com/?p=333).
As mentioned earlier, the plug-in requires an XML database file storing the MSDN documentation. We cannot distribute the database file with the plug-in because Microsoft holds the copyright for it. However, we provide a script to generate the database file. It can be cloned from the git repository at https://github.com/fireeye/flare-ida together with the annotation plug-in.
You can take the following steps to setup the database file. You only have to do this once.
- Download and install an offline version of the MSDN
documentationYou can download the Microsoft Windows SDK MSDN
documentation. The standalone installer can be downloaded from http://www.microsoft.com/en-us/download/details.aspx?id=18950.
Although it is not the newest SDK version, it includes all the
needed information and data extraction is straight-forward.As shown
in Figure 6, you can select to only install the help files. By
default they are located in C:\Program Files\Microsoft
Figure 6: Installing a local copy of the MSDN documentation
- Extract the files with an archive manager like 7-zip to a directory of your choice.
- Download and extract tilib.exe from Hex-Ray’s
download page at https://www.hex-rays.com/products/ida/support/download.shtml
To allow the plug-in to rename constants, it needs to know which enumerations to import. IDA Pro stores this information in TIL files located in %IDADIR%/til/. Hex-Rays provides a tool (tilib) to show TIL file contents via their download page for registered users. Download the tilib archive and extract the binary into %IDADIR%. If you run tilib without any arguments and it displays its help message, the program is running correctly.
- Run MSDN_crawler/msdn_crawler.py
<path to extracted MSDN documentation> <path to
tilib.exe> <path to til files>
With these prerequisites fulfilled, you can run the MSDN_crawler.py script, located in the MSDN_crawler directory. It expects the path to the TIL files you want to extract (normally %IDADIR%/til/pc/) and the path to the extracted MSDN documentation. After the script finishes execution the final XML database file should be located in the MSDN_data directory.
You can now run our plug-in to annotate your disassembly in IDA.
Running the MSDN annotations plug-in
In IDA, use File - Script file... (ALT + F7) to open the script named annotate_IDB_MSDN.py. This will display the dialog box shown in Figure 7 that allows you to configure the modifications the plug-in performs. By default, the plug-in annotates functions, arguments and rename constants. If you change the settings and execute the plug-in by clicking OK, your settings get stored in a configuration file in the plug-in’s directory. This allows you to quickly run the plug-in on other samples using your preferred settings. If you do not choose to annotate functions and/or arguments, you will not be able to see the respective descriptions by hovering over the element.
Figure 7: The plug-in’s configuration window showing the default settings
When you choose to use repeatable comments for function name annotations, the description is visible in the disassembly listing, as shown in Figure 8.
Figure 8: The plug-in’s preview of function annotations with repeatable comments
Similar Tools and Known Limitations
Parts of our solution were inspired by existing IDA Pro plug-ins, such as IDAScope and IDAAPIHelp. A special thank you goes out to Zynamics for their MSDN crawler and the IDA importer which greatly supported our development.
Our plug-in has mainly been tested on IDA Pro for Windows, though it should work on all platforms. Due to the structure of the MSDN documentation and limitations of the MSDN crawler, not all constants can be parsed automatically. When you encounter missing information you can extend the annotation database by placing files with supplemental information into the MSDN_data directory. In order to be processed correctly, they have to be valid XML following the schema given in the main database file (msdn_data.xml). However, if you want to extend partly existing function information, you only have to add the additional fields. Name tags are mandatory for this, as they get used to identify the respective element.
For example, if the parser did not recognize a commonly used constant, we could add the information manually. For the CreateFileA function’s dwDesiredAccess argument the additional information could look similar to Listing 1.
<?xml version="1.0" encoding="ISO-8859-1"?>
<description>All possible access rights</description>
Listing 1: Additional information enhancing the dwDesiredAccess argument for the CreateFileA function
In this post, we showed how you can generate a MSDN database file used by our plug-in to automatically annotate information about functions, arguments and constants into IDA Pro’s disassembly. Furthermore, we talked about how the plug-in works, and how you can configure and customize it. We hope this speeds up your analysis process!
Stay tuned for the FLARE Team’s next post where we will release solutions for the FLARE On Challenge (www.flare-on.com).