Count the number of observations that occur per day

Question

I have a pandas dataframe indexed by time. I want to know the total number of observations (i.e. dataframe rows) that happen each day.

Here is my dataframe:

import pandas as pd

data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592', '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109', '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'], 
        'value': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
df = pd.DataFrame(data, columns = ['date', 'value'])
print(df)

What I want is a dataframe (or series) that looks like this:

   date             value
0  2014-05-01       2
1  2014-05-02       3
2  2014-05-03       2
3  2014-05-04       2

After reaching a bunch of StackOverflow questions, the closest I can get is:

df['date'].groupby(df.index.map(lambda t: t.day))

But that doesn't produce anything of use.

score 6 · Answer 1 · answered May 08 '14 at 00:43

Use resampling. You'll need the date columns to be datetime data type (as is, they are strings) and you'll need to set it as the index to use resampling.

In [13]: df['date'] = pd.to_datetime(df['date'])

In [14]: df.set_index('date').resample('D', 'count')
Out[14]: 
            value
date             
2014-05-01      2
2014-05-02      4
2014-05-03      2
2014-05-04      2

You can use any arbitrary function or built-in convenience functions given as strings, included 'count' and 'sum' etc.

score 1 · Answer 2 · edited May 23 '17 at 11:52

1

Wow, @Jeff wins:

df.resample('D',how='count')

My worse answer:

The first problem is that your date column is strings, not datetimes. Using code from this thread:

import dateutil
df['date'] = df['date'].apply(dateutil.parser.parse)

Then it's trivial, and you had the right idea:

grouped = df.groupby(df['date'].apply(lambda x: x.date()))
grouped['value'].count()

edited May 23 '17 at 11:52

Community

1
1

answered May 08 '14 at 00:32

exp1orer

11,481
7
38
51

1

Might be safer to use `pd.to_datetime(df['date'])`. Applying is probably slower too. – Dan Allan May 08 '14 at 00:39
1

I was writing an answer with resample while Jeff made that comment. See my answer. I think you want 'count' instead of 'sum'. – Dan Allan May 08 '14 at 00:44

jedwards · Answer 3 · 2014-05-08T00:44:37.813

I know nothing about pandas, but in Python you could do something like:

data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592', '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109', '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'],
        'value': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

import datetime
dates = [datetime.datetime.strptime(ts, '%Y-%m-%d %H:%M:%S.%f').strftime('%Y-%m-%d') for ts in data['date']]

cnt = {}
for d in dates: cnt[d] = (cnt.get(d) or 0) + 1

for i, k in enumerate(sorted(cnt)):
    print("%d  %s  %d" % (i,k,cnt[k]))

Which would output:

0  2014-05-01  2
1  2014-05-02  4
2  2014-05-03  2
3  2014-05-04  2

If you didn't care about parsing and reformatting your datetime strings, I suppose something like

dates = [d[0:10] for d in data['date']]

could replace the longer dates=... line, but it seems less robust.

score 0 · Answer 4 · answered May 08 '14 at 00:44

As exp1orer mentions, you'll need to convert string date to date format. Or if you simply want to count obs but don't care date format, you can take the first 10 chars of date column. Then use the value_counts() method (Personally, I prefer this to groupby + sum for this simple obs counts.

You can achive what you need by one liner:

In [93]: df.date.str[:10].value_counts()
Out[93]: 
2014-05-02    4
2014-05-04    2
2014-05-01    2
2014-05-03    2
dtype: int64

Count the number of observations that occur per day

4 Answers4