Notre contribution à la communauté de la sécurité de l'information et des systèmes d'information

Pokedex : « Machoke, the Superpower Pokémon. With enough strength to lift a giant truck with one hand, Machoke are often used for extremely heavy work. »

After some time spent polishing the code, we proudly release Machoke, the CFG-based hash algorithm we introduced during a small talk at r2con in Barcelona in September.

Machoke

Fuzzy hash and CFG

A fuzzy hash is a hash that can match inputs with homologies, therefore, this kind of hash is very useful in malware classification because it gives us the possibiity to classify malwares families by similarities. Currently, such type of hash includes algorithms like ssdeep, tlsh etc.

Machoke is a fuzzy hash using the control-flow graph of a sample as input for hashing. The control-flow graph is a representation, using graph notation, of all paths that might be traversed by a program during its execution (see below for an example of a CFG of a function as seen by radare2).

control-flow graph exampleCFG-based hashing can have many benefits over other classical means of malware classification like :

  • ability to resist minor changes in the samples being classified
  • allowing clusterisation (like ssdeep)

Original work

This implementation is based on Machoc, originally published by ANSSI during SSTIC2015 as a part of polichombr (https://github.com/ANSSI-FR/polichombr). The original algorithm is the work of @Heurs.

Our implementation is roughly the same, but unlike ANSSI’s Machoc, is implemented using radare2 and r2pipe instead of miasm or IDApython.

Machoke

In order to be able to compute such hash, this algorithm uses is the following:

  1. List the functions of the binary
  2. For a given function:
    • Label all the nodes (or code blocks) and calls of the graph
    • Record all the connection between the nodes as a string
    • Perform the murmurhash3 of the string produced
  3. Iterate for each functions and concatenate the strings produced

Machoke hash format

The size of the hash produced is directcly linked to the number of functions in the binary. Therefore, it is most likely that the bigger the sample you try to machoke, the bigger the machoke hash produced.

r2con

Being implemented in python using r2pipe, Machoke was introduced at r2con2017 in Barcelona.
The detail of the algorithm behind Machoke, along with the story behind the name can be viewed here.

The slides are available here.

Code : https://github.com/conix-security/machoke

CONIX

By | 2018-03-01T15:50:21+00:00 24/10/2017|Non classé|