Count identical dates in pandas dataframe

Question

I have a dataframe with a date column and I would like to create a new column that tells me how many identical dates the dataset contains. This is a min example of the original data set:

df1:

I would like to create this date_count, so the target data set is:

df1:

date         date_count
2017/01/03     2
2017/01/03     2
2017/01/04     3
2017/01/04     3
2017/01/04     3
2017/01/05     1

The actual code to create df1:

dict1 = [{'date': '2017/01/03', 'date_count': 2},{'date': '2017/01/03',              'date_count': 2}, 
 {'date': '2017/01/04', 'date_count': 3},{'date': '2017/01/04',   'date_count': 3},
{'date': '2017/01/04', 'date_count': 3},{'date': '2017/01/05',    'date_count': 1}]
df = pd.DataFrame(dict1, index=['s1', 's2','s3','s1','s2','s3'])

I do now. But that would only give me an output which lists the occurences, right? — Niccola Tartaglia, Jul 13 '18 at 19:40
you can just use a `groupby` and the `pd.merge` or `transform`. — rpanai, Jul 13 '18 at 19:43
@ As I suggested. but he should tell us which one is his original df. — rpanai, Jul 13 '18 at 19:46

sacuL · Answer 1 · 2018-07-13T19:54:44.623

6

Here is another method using map along with a groupby and size:

>>> df
          date
s1  2017/01/03
s2  2017/01/03
s3  2017/01/04
s1  2017/01/04
s2  2017/01/04
s3  2017/01/05

df['date_count'] = df.date.map(df.groupby('date').size())

>>> df
          date  date_count
s1  2017/01/03           2
s2  2017/01/03           2
s3  2017/01/04           3
s1  2017/01/04           3
s2  2017/01/04           3
s3  2017/01/05           1

edited Jul 13 '18 at 19:54

answered Jul 13 '18 at 19:50

sacuL

49,704
8
81
106

this works perfectly!!! thanks – Niccola Tartaglia Jul 13 '18 at 19:53
Is this significantly faster or slower than just transform? It's certainly safer than `transform('size')` in the case of sending it an empty DataFrame, but `.transform('count')` doesn't seem to suffer from that. – ALollz Jul 13 '18 at 19:54
1

It looks faster to me `523 µs ± 17.4 µs` vs `1.88 ms ± 39.4 µs`.It will be interesting to see how it works for bigger df. – rpanai Jul 13 '18 at 19:57
3

On small dataframes, map will be faster, this will not be the case on very large dataframes. Probably will switch around 50k rows. `transform` will become faster earlier based on the change I just made – user3483203 Jul 13 '18 at 19:58
2

@user3483203 That's exactly what I found – sacuL Jul 13 '18 at 20:03

user3483203 · Answer 2 · 2018-07-13T20:00:28.163

3

Using count with transform

df['count'] = df.groupby('date')['date'].transform('count')

         date  count
0  2017/01/03      2
1  2017/01/03      2
2  2017/01/04      3
3  2017/01/04      3
4  2017/01/04      3
5  2017/01/05      1

edited Jul 13 '18 at 20:00

answered Jul 13 '18 at 19:52

user3483203

50,081
9
65
94

I'm wondering why is not working without `.reset_index()`. – rpanai Jul 13 '18 at 19:53
1

Because there is only a single column, so once you groupby it has nothing to count, `reset_index` gives it a column to aggregate – user3483203 Jul 13 '18 at 19:53
Thanks. I never used `df` with one column only. – rpanai Jul 13 '18 at 19:55
1

@user32185 I didn't realize you could index the column you grouped by, which significantly speeds up this method. – user3483203 Jul 13 '18 at 20:02

Count identical dates in pandas dataframe

2 Answers2

Linked