I stumbled across pandas and it looks ideal for simple calculations that I'd like to do. I have a SAS background and was thinking it'd replace proc freq -- it looks like it'll scale to what I may want to do in the future. However, I just can't seem to get my head around a simple task (I'm not sure if I'm supposed to look at pivot/crosstab/indexing
- whether I should have a Panel
or DataFrames
etc...). Could someone give me some pointers on how to do the following:
I have two CSV files (one for year 2010, one for year 2011 - simple transactional data) - The columns are category and amount
2010:
AB,100.00
AB,200.00
AC,150.00
AD,500.00
2011:
AB,500.00
AC,250.00
AX,900.00
These are loaded into separate DataFrame objects.
What I'd like to do is get the category, the sum of the category, and the frequency of the category, eg:
2010:
AB,300.00,2
AC,150.00,1
AD,500.00,1
2011:
AB,500.00,1
AC,250.00,1
AX,900.00,1
I can't work out whether I should be using pivot/crosstab/groupby/an index
etc... I can get either the sum or the frequency - I can't seem to get both... It gets a bit more complex because I would like to do it on a month by month basis, but I think if someone would be so kind to point me to the right technique/direction I'll be able to go from there.