I have a dataset with several date fields including hours. I want to use one of them as my df index, and count the number of entries which where created each day. In other words, if I have:
Date | Several features
2020-02-08 10h00 | ...
2020-02-08 11h00 | ...
2020-02-10 10h00 | ...
2020-02-10 11h00 | ...
2020-02-10 13h00 | ...
I want to get:
2020-02-08 | 2
2020-02-10 | 3
For this, I am doing:
df["datetime"] = pd.to_datetime(df["datetime"])
df = df.set_index('datetime')
df.resample('D')["id"].count()
where id
is an unique identifier each entry has.
However, I am getting the following output:
2020-02-08 | 2
2020-02-09 | 0
2020-02-10 | 3
How can I get rid of the "2020-02-09" row? I only want to count the occurences of the days I have on my dataset, not the ones I do not have.