data_hacking - Examples on using IPython, Pandas, and Scikit


data_hacking is are examples of using IPython, Pandas, and Scikit Learn to get the most out of your security data.


How to get this tool

To use this tool, please use a method listed below.

In a Linux (Debian OS), run the following command(s).

git clone 

cd data_hacking

sudo python install


Download directly from the following link:


How to execute

Most of the notebooks will have relative paths to some resources, data files or images. In general the easiest way we found to run ipython on the notebooks is to change into that project directory and run ipython with this alias (put in your .bashrc or whatever):

alias ipython='ipython notebook --FileNotebookManager.notebook_dir=`pwd`'

cd data_hacking/fun_with_syslog

ipython (as aliased above)


Python Modules Used:

  • IPython: Architecture for interactive computing and presentation
  • Pandas: Python Data Analysis Library
  • Scikit Learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
  • Matplotlib: Python 2D plotting library


  • Detecting Algorithmically Generated Domains (BSidesDFW 2013)
  • Hierarchical Clustering of Syslogs (BSidesDFW 2013)
  • Exploration of data from Malware Domain List (BSidesDFW 2013)
  • SQL Injection (Shmoocon 2014)
  • Browser Agent Fingerprinting (Shmoocon 2014)
  • PE File Classification (BSides 2014)
  • PCAP Exploration (BSidesATX 2014)
  • Drive-By PCAP Analysis (ISSW 2014)
  • Mach-O Classification (SANS DFIR 2014)
  • Yara Clustering (BSides Las Vegas 2014)
  • SWF Classification (ShmooCon 2015)
  • Java Class File Classification (ShmooCon 2015)





Next steps:


This article was contributed by Jason Jacobs from Guyana. Jason is a member of the Caribbean CSPA.

Was this article helpful?
0 out of 0 found this helpful



Article is closed for comments.