Count Re-occurrence of a value in python

Question

I have a data set which contains something like this:

SNo  Cookie
1       A
2       A
3       A
4       B
5       C
6       D
7       A
8       B
9       D
10      E
11      D
12      A

So lets say we have 5 cookies 'A,B,C,D,E'. Now I want to count if any cookie has reoccurred after a new cookie was encountered. For example, in the above example, cookie A was encountered again at 7th place and then at 12th place also. NOTE We wouldn't count A at 2nd place as it came simultaneously, but at position 7th and 12th we had seen many new cookies before seeing A again, hence we count that instance. So essentially I want something like this:

Sno Cookie  Count
 1     A     2
 2     B     1
 3     C     0
 4     D     2
 5     E     0

Can anyone give me logic or python code behind this?

How your data set looks like in a representation is not as interesting as a piece of code which sets a suitable data structure up to contain it. This is not the same as making a [mcve], but similar and serves similar purposes. — Yunnosch, Aug 28 '18 at 20:33

score 3 · Accepted Answer · answered Aug 28 '18 at 20:31

3

One way to do this would be to first get rid of consecutive Cookies, then find where the Cookie has been seen before using duplicated, and finally groupby cookie and get the sum:

no_doubles = df[df.Cookie != df.Cookie.shift()]

no_doubles['dups'] = no_doubles.Cookie.duplicated()

no_doubles.groupby('Cookie').dups.sum()

This gives you:

Cookie
A    2.0
B    1.0
C    0.0
D    2.0
E    0.0
Name: dups, dtype: float64

answered Aug 28 '18 at 20:31

sacuL

49,704
8
81
106

Hey thanks.. But I think your answer works for cookie comes simultaneously 2 times, what if it comes more than 2 times lets say 5 times? Then what will be the logic? – Kshitij Yadav Aug 28 '18 at 20:34
That will still work, because the code to create `no_doubles` will get rid of consecutive Cookies, regardless of whether there are 2 or 200000 consecutively – sacuL Aug 28 '18 at 20:36
Man! You just saved me my job. That worked so smoothly! Thanks a ton buddy :) – Kshitij Yadav Aug 28 '18 at 20:43
Done! Thank you. Could you also help me in upvoting my answer? I am new here and had 2 bad questions asking by the community. It will help me. Thank you. – Kshitij Yadav Aug 28 '18 at 20:57
You already have my +1 (this I guess was counter-acted by someone else's downvote...) – sacuL Aug 28 '18 at 20:58
Hey could you see this question: https://stackoverflow.com/questions/52083723/count-re-occurrence-of-a-value-in-python-aggregated-with-respect-to-another-valu – Kshitij Yadav Aug 29 '18 at 18:08

DYZ · Answer 2 · 2018-08-28T21:19:03.373

2

Start by removing consecutive duplicates, then count the survivers:

no_dups = df[df.Cookie != df.Cookie.shift()] # Borrowed from @sacul
no_dups.groupby('Cookie').count() - 1
#        SNo
#Cookie     
#A         2
#B         1
#C         0
#D         2
#E         0

edited Aug 28 '18 at 21:19

answered Aug 28 '18 at 20:39

DYZ

55,249
10
64
93

DYZ Can a code like this do the count here:?`df.groupby('Cookie').size().reset_index(name='Count')` – Sai Kumar Aug 28 '18 at 21:22
1

Your code will not eliminate consecutive duplicates. – DYZ Aug 28 '18 at 21:23
DYZ can you help solving me this: https://stackoverflow.com/questions/52083723/count-re-occurrence-of-a-value-in-python-aggregated-with-respect-to-another-valu – Kshitij Yadav Aug 29 '18 at 18:09

piRSquared · Answer 3 · 2018-08-28T21:26:55.557

`pandas.factorize` and `numpy.bincount`

If immediately repeated values are not counted then remove them.
Do a normal counting of values on what's left.
However, that is one more than what is asked for, so subtract one.

factorize
Filter out immediate repeats
bincount
Produce pandas.Series

i, r = pd.factorize(df.Cookie)
mask = np.append(True, i[:-1] != i[1:])
cnts = np.bincount(i[mask]) - 1

pd.Series(cnts, r)

A    2
B    1
C    0
D    2
E    0
dtype: int64

`pandas.value_counts`

zip cookies with its lagged self, pulling out non repeats

c = df.Cookie.tolist()

pd.value_counts([a for a, b in zip(c, [None] + c) if a != b]).sort_index() - 1

A    2
B    1
C    0
D    2
E    0
dtype: int64

`defaultdict`

from collections import defaultdict

def count(s):
  d = defaultdict(lambda:-1)
  x = None
  for y in s:
    d[y] += y != x
    x = y

  return pd.Series(d)

count(df.Cookie)

A    2
B    1
C    0
D    2
E    0
dtype: int64

Count Re-occurrence of a value in python

3 Answers3

`pandas.factorize` and `numpy.bincount`

`pandas.value_counts`

`defaultdict`

Linked

Count Re-occurrence of a value in python

3 Answers3

pandas.factorize and numpy.bincount

pandas.value_counts

defaultdict

Linked

`pandas.factorize` and `numpy.bincount`

`pandas.value_counts`

`defaultdict`