2

I'm new to pandas and I want to be able to get number of instances for each person and feed it into a another Dataframe as a column. I've removed the NaN values from the dataframe before I made the group by the user column

I've tried this but it doesn't seem to work

DF["NumInstances"] = userGrp["user"].value_counts()

I've look over the internet, but can't seem to find a solution, please help.

Edit: Sample Data and Expected Outcome

[{"user" : "4",
"Instance": "21"},
 {"user" : "4",
"Instance": "6"},
{"user" : "5",
"Instance" : "546453"}]

Expected outcome:

DataFrame =

[{"user":"4",
 "NumInstances" : "2"},
 {"user":"5",
 "NumInstances" : "1"}]

So basically counts how many times the instance occurs for each user across data entries.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

3 Answers3

2

Based on your sample input, you can do this:

In [2535]: df = pd.DataFrame([{"user" : "4", 
      ...: "Instance": "21"}, 
      ...:  {"user" : "4", 
      ...: "Instance": "6"}, 
      ...: {"user" : "5", 
      ...: "Instance" : "546453"}])  

In [2539]: df.groupby('user', as_index=False).count()
Out[2539]: 
  user  Instance
0    4         2
1    5         1
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • Yeah, I made a slight alteration and it seemed to have worked: InstancePerUser.groupby('user', as_index=False)['Instance'].count() – AgentArachnid Jun 10 '20 at 16:30
0

if DF is the name of your dataset and "user" the name of the column you want to groupby for, then try:

count = DF.groupby("user").count()

print(count)

Filippo Sebastio
  • 1,112
  • 1
  • 12
  • 23
0

I used the following solution that will create a new dataframe which contains both column named "user" and "NumInstances" :

df_counts = df.groupby(['user']).size().reset_index(name='NumInstances')

Hope it helps.

Flo
  • 936
  • 1
  • 8
  • 19