-1

I did the for command, to know how many times the ID repeats in the dateframe, now I need to create a column indexing the total in the respective ID.

In short, I need a column with the repeat total of df['ID'], how to index a total of the groupby command?

test = df['ID'].sort_values(ascending=True)

rep = 0
for k in range(0,len(test)-1):
  if(test[k]==test[k+1]):
    rep += 1
    if(k==len(test)-2):
      print(test[k],',', rep+1)
  else:
    print(test[k],',', rep+1)
    rep = 0

out:

> 7614381 , 1 
> 349444 , 5 
> 4577800 ,7
Jorge Fuentes González
  • 11,568
  • 4
  • 44
  • 64
Luan Brito
  • 19
  • 5
  • 1
    It's usually wrong to write `for` loops when using pandas or numpy, because they provide built-in methods that automatically process the entire dataframe or array. – Barmar Oct 06 '22 at 21:07
  • I need a column with the repeat total of df['ID'], how to index a total of the groupby command? For example, a column with the total number of times that given df['ID'] appeared in the dataframe. – Luan Brito Oct 07 '22 at 11:27

1 Answers1

0

"For example, a column with the total number of times that given df['ID'] appeared in the dataframe"

Is this what you mean?

import pandas as pd

df = pd.DataFrame(
     {'id': [7614381, 349444 ,349444, 4577800, 4577800 ,349444, 4577800]}
 )

df["id_count"] = df.groupby('id')['id'].transform('count')

df

Output

     id       id_count
0  7614381         1
1   349444         3
2   349444         3
3  4577800         2
4   349444         3
5  4577800         2

based on: https://stackoverflow.com/a/22391554/11323137

hannez
  • 21
  • 4