Forgive me if this is somewhat simple but I can't find any answers elsewhere
I have a largeish DataFrame of raw transactional data that I am breaking down to do aggregations by user.
my df looks like:
transid, userid, catname, amount, date, origin
1, id1, cat1, 300, 01/03/2014, cc
2, id1, cat1, 200, 02/03/2014, cc
3, id1, cat2, 250, 01/03/2014, ca
4, id2, cat1, 130, 01/03/2014, cc
5, id2, cat2, 150, 01/03/2014, cc
circa 1million trans records with 115k users
I am creating a wide table with metrics of aggregations with date and userid the index.
date, userid, countoftrans, sumtrans
01/03/2014, id1, 3, 750,
Now, I am happily doing this but find when I am getting to doing the category slicing it is getting very manual.
I am creating series and then merging them back into a data frame but I am doing named slicing which is a) manual and b) is slow
#take a slice from the df
cat1 = df[df.catname == "cat1"]
#group the slice by user
cat1out = cft.groupby(cft.userid)
#then do calc on series
cat1count = cat1.transid.count()
Edit: The result set needs to be in the form of 1 line user id with a wide set of columns e.g.
So while the Groupby aggregate does indeed organize my data logically its how I want (I just don't quite get GroupbyObjects aswell)
I have 12 parent categories and then between 3-5 sub categories and will have to do this across all so would like a succinct way to do this.
I have tried to do a for loop on distinct cat names and then insert the item. That didn't really work but is there a more pandithic(?) way
Thanks
Rob