Who doesn’t love Pandas?

PandasLast night I started looking into the “Analytics” of this very blog. It’s something I had promised myself to do often, but it never happened. Looking at the graphs, prepared by the online (free) tool supplied by my provider, was/is simply such an un-rewarding and dry experience. The fact is, the information I am truly looking for  is simply not there.

On the other side the original log files (cvs) are available but are really long and manipulating them with Excel is a tedious and (very) error prone process, even for a “Spreadsheet Master” like me (I have been in marketing for almost 15 years now…).

Enters the Pandas… no, not those in the picture! I am referring to the Python Data Analysis Library , a tool I had heard of many times in the past but always ignored as I considered it a thing for web jockeys… (scoff)!

Turns out I was so wrong! I took the 10 Minutes Intro and … well 10 minutes later I was looking at the data I wanted or rather I had dreamed of  for so long!

To be fully honest, four hours  later I was still there fiddling with that same data, but that was simply because I could not stop playing and refining my views!

Turns out all I needed were quite literally 3 lines of (Pandas) code. Here is the first one where I import the cvs log file:

import numpy as np
import pandas as pd

log = pd.read_csv( argv[1], sep=' ', header=None, names=[u'ip', u'B', u'C', u'DTime', u'E', u'Request', u'G', u'H', u'From', u'L', u'M', u'N'])

Next,  filtering the rows I want:

dff = log[ log.Request.str.startswith('GET /201')]

Finally, grouping, counting and sorting the data:

dfgo = dff[['Request','ip']].groupby('Request').count().sort('ip',ascending=False).head(25)
print dfgo

That’s it!!
As a bonus, I got to play with the Python plotting libraries (Matplotlib) which are also well integrated with Pandas. Here are a couple of more lines to get quite a refined bar chart to replace the crude print out:

dfgo.plot(kind='barh', legend = False, left = 0.65)
plt.title('Top Requests:'+name)
This entry was posted in Python, Tips and Tricks, Tools. Bookmark the permalink.