1

There is a much similar post in the forum POST but i just cant figure out what how to do it in my exaple.

My code now with explanation below

for i in dfuser.appid :
  print i
  d = dfbinary.loc[dfbinary['appid'] == i]
  print d
  glist = dfbinary.columns[dfbinary.loc[i]==1] 
  print glist

I have a dataframe with a list of users with their apps (dfuser) and i have another dataframe with the genres of all the apps(an app may have more than one genre). So i want to see which genre is more popular in each user.

My code is fine except that glist is not finding the appid that i want but finds the appid with index i. For example i=10 , so it will find the app that is at row 11(10).

This is what it prints

   10
   appid  Accounting  Action  Adventure  Animation&Modeling  AudioProduction... 
0   10.0         0.0     1.0        0.0                 0.0              0.0  
[1 rows x 23 columns]
Index([u'Action'], dtype='object') 

(And this just happens to be correct)

Community
  • 1
  • 1
Thodoris P
  • 553
  • 1
  • 6
  • 11

1 Answers1

1

Firstly whenever you have a loop with pandas you are probably doing it wrong!

You need to use merge to combine the two dataframes and select only the user and genre columns. It works just like SQL. Then you have a table keyed on user/genre. Now you can groupby("user").count(). No explicit loops.

simon
  • 2,561
  • 16
  • 26
  • Well of course i thought about that. Problem is that both files are big and the system can not compute such big file, saying that i merge them. Imagine 1million rows x 30 columns @simon – Thodoris P Nov 02 '16 at 23:56
  • You don't need 30 columns though. So select what you need first then merge. – simon Nov 03 '16 at 00:03
  • The genres are near 30, so yeah unfortunately i need them – Thodoris P Nov 03 '16 at 00:39
  • i did merge the 2 files and i got a huge one. Now i am trying to do the groupby function but i get MemoryError. Is there maybe a way to iterate over user? Something like `for user in df.UserID` – Thodoris P Nov 03 '16 at 01:22
  • I thought of chunksize but the number of rows for each user is not standard so maybe i would slice some users in half – Thodoris P Nov 03 '16 at 02:03