MEDS: Malware Evolution Discovery System

03:30 PM - 04:00 PM on August 16, 2014, Room 701

Antonio Cesar Vargas

Audience level:
Core Python


This talk will be about my experience creating a Python framework for the creation of phylogenetic, lineage, tree of Android malware and to make predictions of generative malware using machine learning within the MEDS framework. Most of this work is from my master's thesis. The goal is to release this framework as an open source project for the benefit of the cybersecurity and digital forensics communities. Furthermore, the intention of this talk is to bring awareness to the Python community of how Python is being used to fight malware and make the Internet a safer place.


Malware, or malicious software, affects every computing device at our disposal, including personal computers, dedicated servers and more recently, mobile devices such as smart phones and tablets. The information stored on these devices makes them attractive targets for illegal financial gain by cybercriminals, and corporate espionage or even strategic operations by government agencies. Yet, traditional detection measures are increasingly ineffective at detecting the extensive number of malware variants. To make matters worse, these variants are becoming commodity products whose manufacture is facilitated by an underground industry that seeks to meet demands for products that can bypass current anti-malware technologies. Consequently, most present malware is not new since the development of new “from-scratch” malware is not economically viable for the underground malware industry. Instead, most malware found in the wild is a modification or feature upgrade of previously created malware.

This talk attempts to frame the malware problem from the perspective of the evolutionary production of malware. The goal is to design an architecture that allows researchers to discover generative malware, and develop an initial implementation of the Malware Evolution Discovery System (MEDS) using Python. MEDS supports the creation of phylogenetic trees of malware, and attempts to make predictions of generative malware by applying two models of supervised regression analysis on malware samples and their corresponding phylogenetic tree. Finally, the MEDS framework is made available as an open source project, thus providing an innovative tool that has been previously unavailable for the cybersecurity and digital forensics communities.