0

Stimulated by another post, my story is: I have this df

   col
0  B
1  B
2  A
3  A
4  A
5  B

and i need this output

   col col_frequencies
0  B   1
1  B   2
2  A   1 
3  A   2
4  A   3
5  B   3

# Value in row 5 is the update of that in row 2. I do not want the counter of frequencies be resetted

Something like a countif in excel.

Thanks in advance from a total beginner, G.

3 Answers3

0

you can use the value_count function of pandas, to get the frequency of any data point.

Shubh Patni
  • 468
  • 1
  • 4
  • 7
0

You can do this in two stages:

  1. Group all rows with same col value. This can be done using groupby().

  2. Get index of each row in the new group. You do this with cumcount() (which start from zero, so you want to add +1 to it)

All in one:

df['col_frequencies'] = df.groupby(['col']).cumcount()+1;

for example (sorry for laziness in columns name)

import pandas as pd

df = pd.DataFrame(['B', 'B', 'A', 'A', 'A', 'B'])
print(df)
df['Col'] = df.groupby([0]).cumcount()+1;

output:

    0   Cola
0   B   1
1   B   2
2   A   1
3   A   2
4   A   3
5   B   3
Roim
  • 2,986
  • 2
  • 10
  • 25
0

This should solve your problem:-

Let say your data frame name is df.

res = {}
r = []
for i, row in df.iterrows():
    if row['col'] in res:
        res[row['col']] += 1
        r.append(res[row['col']])
    else:
        res[row['col']] = 1
        r.append(res[row['col']])

df['col_frequencies'] = r

The output will be:-

   col col_frequencies
0  B   1
1  B   2
2  A   1 
3  A   2
4  A   3
5  B   3
Dhaval Taunk
  • 1,662
  • 1
  • 9
  • 17