Continuing on in our Encryption 101 series, where we gave a malware analyst’s primer on encryption and demonstrated encryption techniques using ShiOne ransomware, we now look at what it takes to break an encryption. In order for something as powerful as encryption to break, there needs to be some kind of secret flaw. That flaw is often a result of an error in implementation.
There are a number of things that can go wrong for someone who is implementing encryption. What’s difficult is being able to identify and analyze the methods a programmer used for encryption and look for any weaknesses to exploit.
These weaknesses can be anything from weak encryption algorithms and weak key generators to server-side vulnerabilities and leaked keys.
Before you can even attempt to find the weakness, you must first know what was the encryption algorithm being used. A lot of times, it’s as simple as looking at the API calls. If this is the case, it can be quite simple to identify the algorithm. This was the case for the previous ShiOne walkthrough.
There are times, however, where the encryption is statically compiled into the malware or even a custom written encryption algorithm is used. When this is the case, you must be able to understand the inner workings of encryption algorithms to be able to identify code.
A file’s content will be encrypted and written back into the file, so a quick method to narrow down the general region where the encryption lies is to simply xref the ReadFile and WriteFile API calls. The encryption implementation will likely be performed between these two points.
When looking for statically compiled encryption code, as we mentioned, you will not have the luxury of searching for any API calls. A basic understanding of some of the low-level details of how these encryption algorithms work will be necessary.
Starting off, below, we have the high-level flow of AES algorithm. In general, most synchronous encryption algorithms have a similar flow to this; the differences may be the types of mathematical operations performed, but the core concepts remain the same. So, understanding AES will be enough of a starting point to help identify other types going forward in a real-world analysis.
With AES, being that it is a synchronous encryption algorithm, it performs a series of mathematical and logical operations on three things working together:
Depending on the flavor of AES and key size, the flow will be slightly different. In the picture above, you see a loop involving a few blocks:
What is happening in these steps is the file data is read into a matrix of a fixed number of bytes. In this case, it’s 16 bytes, but depending on the algorithm, it could be anything. Here are the rounds of steps:
Each set of these four series of operations is considered one round. AES can have 10 to 14 rounds. This means that when you are looking for the encryption code inside of a binary, it will likely be a long function with a lot of repetitive-looking code. This is one aspect that can help you identify it as encryption code when looking though the binary.
Here is another example of a round of encryption, likely from a different flavor of AES or similar synchronous crypto:
As you can see, the order of operations is a bit different. These kinds of details are not too important to us because we are not cryptographers. In general, we are not looking to find the weakness in AES algorithm itself, we are looking to find a weakness in the implementation. The reason for going into such detail on the inner workings of AES is only to give you an understanding of how it works so that you can identify it in code when you see it in the wild.
I will point you to a previous analysis we did of the Scarab ransomware. This was an example from which the code above was taken. They were encrypting files using statically compiled AES—no API calls. We had to do some research on the inner workings of various encryption methods to be able to properly identify what the algorithm was actually doing.
The details on the number of sets of these operations in this function was one of the main indicators to us as to which algorithm this code belongs.
I am including this image from the previous article once again just to remind about many encryption methods are being used in a single ransomware. This is good to keep an eye out for and not to be confused when you find multiple encryptions being used. Here, we have the flow chart showing the file encryption but also the algorithm that encrypts the previous key. Although it is not the encryption that is modifying the file itself, it will be what is used to keep the file encryption key secure. Both areas are points of weakness when looking to break encryption.
The point is that any number of combinations of encryption can technically be used, as it is up to the author. You must be able to understand and identify each one and the role it plays in the overall scheme. It may be that a single encryption usage was implemented incorrectly and can be broken, and it may be a combination of a few things that together cause a hole in the overall scheme.
A good starting point when looking for weaknesses in encryption is by looking at the encryption key generators, which in most cases are just some form of a random number generator.
If you have ever read anything about encryption, you will likely have come across someone mentioning the importance of the random number generator. The reason for this is that if you can force the output of a random number generator to reproduce the same value that was generated during a previous encryption, you will likely be able to recreate the original encryption keys.
An example of this is shown below. The system time is being used as the seed for a weak random number generator.
For the most part, any computer algorithm can only perform a finite series of operations. If the inputs to a function are the same, the output must also be the same. It is quite logical. In the case of random generators, the ingenuity is in taking enough inputs to seed the random value so that the output is not east to recreate. For example, some weak generators take the time of day as an input. Although this is, in a way, obscure, the conditions can definitely be recreated. What is necessary is to use enough semi-random inputs to give you enough entropy.
As you can see above, a more solid random generator may sample audio data, in addition to the time of day, and use mouse input and a number of other elements to try to make the inputs as random as possible. This requires an unreasonable amount of operations to brute force or recreate.
Here is a theoretical example for ransomware using a weak generator referred to as RNG. Suppose that the ransomware used a RNG-seeded with the current time in microseconds and the encryption is a standard algorithm. These are the basic steps for an attack:
In this scenario, a brute force attack is completely within reason. Now, if the RNG used milliseconds, in combination with the number of processes running at the given time, that adds a bit more complexity. It would take the initial 10,000,000 possibilities multiplied by the range of potential processes running on the machine. You can assume it might be somewhere between 5 and 25 processes. So now, that initial 10,000,000 attempts becomes 200,000,000. It is still iterate-able, but has added more complexity. You get the point.
If you add enough parameters, or parameters with a lot of possible outcomes, the number will eventually become so big where a brute force attempt would not be possible in your lifetime, as shown below.
Below is a list of a few examples of ransomware that were successfully broken and the methods used.
The DES algorithm was developed in the 1970s and was widely used for encryption. It is now considered a weak encryption algorithm because of its key size. The amount of bits generated as the key for an encryption algorithm is one of the considerations for the strength of an algorithm. For example, there was a contest to crack a 40-bit cipher which was won by a student using a few hundred machines at his university. It took only three and half hours. The bigger the size of the key, the harder it will be to crack an encryption—that is, without knowing anything about it.
Not to say that the common analyst has access to such resources, but I just wanted to give you a better understanding of why an encryption algorithm might be considered weak.
Often times, you can get an initial idea of whether the encryption method is solid or not by simply looking at a visualization of the files.
As you can see here, there is low entropy, and the data within the encrypted file shows similarities to the original plaintext. This could mean that there is a weakness to be exploited.
Let’s compare this to a file encrypted with a solid algorithm. You will be able to tell the difference in the high entropy result from encryption:
The file visualization can also be a good starting point when researching to see if a given ransomware is able to be decrypted. It can also point you in the direction of which part of the process you may be looking to attack to break the encryption. In this case, the entropy is good, which leads us to believe that the encryption might be strong. But as you saw from the list above, Cerber was broken by exploiting a server-side vulnerability. So although the encryption itself was strong, a side channel was attacked in order to create a decryptor.
In this post, we covered the need for identification and classification of the encryption algorithm used in order to look for weaknesses. We then went through a primer on identifying what the code might look like. We covered various weaknesses that can potentially be exploited and walked through a theoretical example of a scenario where a network admin might be able to decrypt the ransomware.
Tune in for part four, the final part of our Encryption 101 series, where we will go through the code of a weak ransomware and walk through, line by line, the process of creating a decryptor.