1

If this is a duplicate, please link me to the duplicate. I have not found any other posts that have answered my question.

I have a dataframe, knn_res, with the following dimensions and data:

            username  Prediction  is_bot
0         megliebsch           1       0                                                                                1         megliebsch           1       0
2         megliebsch           1       0
3         megliebsch           1       0
4         megliebsch           1       0
...              ...         ...     ...
1220     ARTHCLAUDIA           1       1
1221     ARTHCLAUDIA           1       1                                                                                1222     ARTHCLAUDIA           1       1
1223     ARTHCLAUDIA           1       1
1224  ASSUNCAOWALLAS           1       1

[1225 rows x 3 columns]

What I would like to do is, for each username, count the number of predictions where prediction = 1 and where prediction = 0 and create two new columns with those values. For example using below dataset:

| username | prediction | is_bot |
|:--------:|:----------:|:------:|
|    foo   |      1     |    1   |
|    foo   |      1     |    1   |
|    foo   |      1     |    1   |
|    foo   |      0     |    1   |
|    foo   |      0     |    1   |
|   foo1   |      0     |    1   |
|   foo1   |      0     |    1   |
|   foo1   |      0     |    0   |
|   foo1   |      0     |    0   |
|   foo1   |      1     |    0   |
|   foo1   |      1     |    0   |
|   foo1   |      0     |    0   |
|   foo2   |      1     |    0   |
|   foo2   |      1     |    0   |
|   foo2   |      1     |    1   |

I would want:

| username | count_bot  | count_human |
|:--------:|:----------:|:-----------:|
|    foo   |      3     |      2      |
|   foo1   |      2     |      5      |
|   foo2   |      3     |      0      |

where the following logic applies:

For each row, if Prediction == 1, then increase the count_bot counter. If Prediction == 0, then increase the count_human counter. Then, append the totals for each row and group by.

So far, I have tried referencing this question and tried the following:

knn_res['count_bot'] = knn_res[knn_res.Prediction == 1].count()
print(knn_res)

Which yields:

            username  Prediction  is_bot  count_bot
0         megliebsch           1       0        NaN
1         megliebsch           1       0        NaN
2         megliebsch           1       0        NaN
3         megliebsch           1       0        NaN
4         megliebsch           1       0        NaN
...              ...         ...     ...        ...
1220     ARTHCLAUDIA           1       1        NaN
1221     ARTHCLAUDIA           1       1        NaN
1222     ARTHCLAUDIA           1       1        NaN
1223     ARTHCLAUDIA           1       1        NaN
1224  ASSUNCAOWALLAS           1       1        NaN

Trying:

new = knn_res.groupby('username').sum()
print(new)

Yields:

                 Prediction  is_bot
username
666STEVEROGERS           25      25
ADELE_BROCK               1      25
ADRIANAMFTTT             24      25
AHMADRADJAB               1      25
ALBERTA_HAYNESS          24      25
ALTMANBELINDA            23      25
ALVA_MC_GHEE             25      25
ANGELITHSS               25      25
ANN1EMCCONNELL           25      25
ANWARJAMIL22             25      25
AN_N_GASTON              25      25
ARONHOLDEN8              25      25
ARTHCLAUDIA              25      25
ASSUNCAOWALLAS            1       1
BECCYWILL                 9      25
BELOZEROVNIKIT           17      25
BEN_SAR_GENT              1      25
BERT_HENLEY              24      25
BISHOLORINE              25      25
BLACKERTHEBERR5          11      25
BLACKTIVISTSUS            7      25
BLACK_ELEVATION          24      25
BOGDANOVAO2               7      25
BREMENBOTE               10      25
B_stever96                1       0
CALIFRONIAREP            24      25
C_dos_94                 25      24
Cassidygirly             25       0
ChuckSpeaks_             25       0
Cyabooty                  0       0
DurkinSays                1       0
LSU_studyabroad          24       0
MisMonWEXP                0       0
NextLevel_Mel            25       0
PeterDuca                24       0
ShellMarcel              25       0
Sir_Fried_Alott          25       0
XavierRivera_             0       0
ZacharyFlair              0       0
brentvarney44             1       0
cbars68                   0       0
chloeschultz11           25       0
hoang_le_96               1       0
kdougherty178            25       0
lasallephilo              0       0
lovely_cunt_              1       0
megliebsch               24       0
msimps_15                24       0
okweightlossdna          24       0
tankthe_hank             24       0

What am I doing wrong to achieve my desired result?

artemis
  • 6,857
  • 11
  • 46
  • 99

2 Answers2

1

Let us try

pd.crosstab(df.username, df.prediction)
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Groupby both username and prediction to separate same values of of columns username and prediction to groups. prediction 0 and prediction 1 will be separated to different groups for each username. Call count on each group(Note: I changed from is_bot to prediction before the count because that what you want). Finally, unstack to put 0 and 1 to columns and rename them as you want

df_out = (df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0).
             rename({0: 'count_human', 1: 'count_bot'}, axis= 1))

Out[30]:
prediction  count_human  count_bot
username
foo                   2          3
foo1                  5          2
foo2                  0          3

Step by step:

groupby each group of username and prediction and count on each group of 0, 1 of each username

df.groupby(['username', 'prediction']).prediction.count()

Out[32]:
username  prediction
foo       0             2
          1             3
foo1      0             5
          1             2
foo2      1             3
Name: prediction, dtype: int64

Unstack to put index prediction to columns

df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0)

Out[33]:
prediction  0  1
username
foo         2  3
foo1        5  2
foo2        0  3

Finally, rename columns to match your desired output

(df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0).
    rename({0: 'count_human', 1: 'count_bot'}, axis= 1))

Out[34]:
prediction  count_human  count_bot
username
foo                   2          3
foo1                  5          2
foo2                  0          3
Andy L.
  • 24,909
  • 4
  • 17
  • 29