1

I have a csv file with 3 columns. users, text and labels. each user has multiple texts and labels. i want to know the label with the highest frequency of occurrence in order to determine the category of each user.

I have tried:

for i in df['user'].unique():
    print (df['class'].value_counts())

which gives returns the same values shown below for all users

4    3062
1    1250
0     393
3     281
2      13
Name: class, dtype: int64

I also tried

for h in df['user'].unique():
    g = Counter(df['class'])
    print (g)

and got

Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})

here is the sample data sample data Please Help

A.Umar
  • 13
  • 3

1 Answers1

1

For counting values by group, you can use groupby with pd.value_counts:

df = pd.DataFrame([[1, 1], [1, 2], [1, 3], [1, 1], [1, 1], [1, 2],
                   [2, 1], [2, 3], [2, 2], [2, 2], [2, 3], [2, 3]],
                  columns=['user', 'class'])

res = df.groupby('user')['class'].apply(pd.value_counts).reset_index()
res.columns = ['user', 'class', 'count']

print(res)

   user  class   count
0     1      1       3
1     1      2       2
2     1      3       1
3     2      3       3
4     2      2       2
5     2      1       1
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    excellent answer. but how do i write conditional statements to access the values for count. i.e. if the count value for a class > 200 then category = 0 or 1 or 2....??? – A.Umar Jun 03 '18 at 10:05
  • That's a separate question, but you can use `pd.cut` or `np.digitize` as per [this answer](https://stackoverflow.com/a/49382340/9209546). – jpp Jun 03 '18 at 10:28
  • 1
    thanks for your quick and accurate responses and i have also looked at https://stackoverflow.com/help/someone-answers – A.Umar Jun 04 '18 at 02:16