Pandas resample by day without filling missing dates

Question

I have a dataset with several date fields including hours. I want to use one of them as my df index, and count the number of entries which where created each day. In other words, if I have:

Date | Several features
2020-02-08 10h00 | ...
2020-02-08 11h00 | ...
2020-02-10 10h00 | ...
2020-02-10 11h00 | ...
2020-02-10 13h00 | ...

I want to get:

2020-02-08 | 2
2020-02-10 | 3

For this, I am doing:

df["datetime"] = pd.to_datetime(df["datetime"])
df = df.set_index('datetime')
df.resample('D')["id"].count()

where id is an unique identifier each entry has.

However, I am getting the following output:

2020-02-08 | 2
2020-02-09 | 0
2020-02-10 | 3

How can I get rid of the "2020-02-09" row? I only want to count the occurences of the days I have on my dataset, not the ones I do not have.

IMO, `resample` is not required. Just a `groupby` with `count()` or `size()` should get you the answer. — Mayank Porwal, Feb 10 '21 at 13:49

jezrael · Accepted Answer · 2021-02-10T14:39:45.190

2

Use Series.dt.date with DataFrame.groupby:

df["datetime"] = pd.to_datetime(df["datetime"])
df = df.groupby(df["datetime"].dt.date)["id"].count()

If need DatetimeIndex working also well in oldier pandas versions use Series.dt.normalize:

df["datetime"] = pd.to_datetime(df["datetime"])
df = df.groupby(df["datetime"].dt.normalize())["id"].count()

In your solution if are generated missing values remove them:

df["datetime"] = pd.to_datetime(df["datetime"])
df = df.set_index('datetime').resample('D')["id"].count().dropna()

Or remove 0 rows:

df["datetime"] = pd.to_datetime(df["datetime"])
df = df.set_index('datetime').resample('D')["id"].count().loc[lambda x: x!= 0]

edited Feb 10 '21 at 14:39

answered Feb 10 '21 at 13:47

jezrael

822,522
95
1,334
1,252

I'd stay away from `dt.date` as Pandas doesn't play very well with `datetime.date`. – Quang Hoang Feb 10 '21 at 14:37
1

@QuangHoang - Ya, in some oldier versions, in new one should working better, but agree, I can add normalize alternative. Thank you. – jezrael Feb 10 '21 at 14:38
Thanks for your answer. If I use your solution and then I want to group by month, is it possible or in that case I should use my approach and go for a `.resample('M')` ? @jezrael – Luiscri Feb 10 '21 at 15:04
1

@Luiscri Tnes is possible use resample solutions from my answers or `df = df.groupby(df["datetime"].dt.to_period('m')) ["id"].count()` – jezrael Feb 10 '21 at 15:31

Pandas resample by day without filling missing dates

1 Answers1