1

How does one count the number of previous rows in a dataframe that contain the same value as a cell in that row?

Given a dataframe, e.g.:

In [1]: df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo', 'foo', 'bar', 'baz', 'foo', 'foo']})

In [2]: df1
Out[2]:

    lkey
0   foo
1   bar
2   baz
3   foo
4   foo
5   bar
6   baz
7   foo
8   foo

I would like to add a column which contains the number of times the value in lkey for that row appears in lkey in all previous rows of the dataframe.

I have a dataframe of shape roughly 100000 x 15. My attempt at a for loop was useless roll eyes.

The desired output would produce:

In [2]: df1['lkeyCount'] = (number of times lkey appears in previous rows in lkey column)
Out[2]:

lkey    lkeyCount
0   foo 0
1   bar 0
2   baz 0
3   foo 1
4   foo 2
5   bar 1
6   baz 1
7   foo 3
8   foo 4

Thanks in advance!

bonz
  • 21
  • 3

0 Answers0