0

I need some help with pandas. I have a DataFrame with a column of unique IDs and each ID has a few different application downloaded.

ID | AppID | Count
1  |  A    |   2
2  |  A    |   3
3  |  B    |   1
4  |  D    |   1
5  |  E    |   5

I am trying to groupby the ID and count the total number of appID for each ID.

Expected output:

ID | A | B | C | D | E....
1  | 2 | 0 | 1 |  8 |  5
2  | 3 | 6 | 7 |  4 |  6  
3  | 9 | 1 | 2 |  5 |  7
4  | 3 | 8 | 4 |  1 |  3
5  | 1 | 1 | 3 |  5 |  5

The code that I have tried are

t = df.groupby(['ID']).agg({i:'value_counts' for i in df.columns[1:]})

and

pd.crosstab(index=t['ID'], columns=t['count'])

The results I gotten

ID | AppID | Count
1  |  A    |   2
1  |  B    |   0
1  |  C    |   1
1  |  D    |   8
1  |  E    |   5

2  |  A    |   3
2  |  B    |   6
2  |  C    |   7
2  |  D    |   4
2  |  E    |   6
asgasega
  • 3
  • 2

1 Answers1

0

If you are looking to sum up the Count values, try:

df.groupby(['ID', 'AppID'])['Count'].sum().unstack(fill_value=0)

If instead you want the number of rows (the number of times each AppID appears for a given ID), regardless of your Count column, then try instead:

df.groupby(['ID', 'AppID']).count().unstack(fill_value=0)

In both cases, the value is established much like your original solution (but using only vectorized ops) and then turned into a wide df by using .unstack().

Pierre D
  • 24,012
  • 7
  • 60
  • 96