1

I have a df column with dates and hours / minutes:

0   2019-09-13 06:00:00
1   2019-09-13 06:05:00
2   2019-09-13 06:10:00
3   2019-09-13 06:15:00
4   2019-09-13 06:20:00
Name: Date, dtype: datetime64[ns]

I need to count how many days the dataframe contains.

I tried it like this:

sample_length = len(df.groupby(df['Date'].dt.date).first())

and

sample_length = len(df.groupby(df['Date'].dt.date))

But the number I get seems wrong. Do you know another method of counting the days?

Akshat Zala
  • 710
  • 1
  • 8
  • 23
Mark T
  • 145
  • 4
  • 12
  • Did you see this? [How to calculate number of days between two given dates?](https://stackoverflow.com/questions/151199/how-to-calculate-number-of-days-between-two-given-dates) – DavideBrex Jun 20 '20 at 08:21
  • @DavideBrex yes, but I need to count the days how they are in my dataframe, not by calendar. they are not continuous in my df – Mark T Jun 20 '20 at 08:27
  • @MarkT Can you show the expected output? – Shubham Sharma Jun 20 '20 at 08:28
  • @Shubham Sharma how should I do it? If I count the days in the excel file I get 278. I export it as csv and python gives me 195 days. I have no clue where the issue is. I also tried this method `sample_length = df['Date'].dt.date.nunique()` – Mark T Jun 20 '20 at 08:31

4 Answers4

2

Consider the sample dates:

sample = pd.date_range('2019-09-12 06:00:00', periods=50, freq='4h')
df = pd.DataFrame({'date': sample})

                  date
0  2019-09-12 06:00:00
1  2019-09-12 10:00:00
2  2019-09-12 14:00:00
3  2019-09-12 18:00:00
4  2019-09-12 22:00:00
5  2019-09-13 02:00:00
6  2019-09-13 06:00:00
...
47 2019-09-20 02:00:00
48 2019-09-20 06:00:00
49 2019-09-20 10:00:00

Use, DataFrame.groupby to group the dataframe on df['date'].dt.date and use the aggregate function GroupBy.size:

count = df.groupby(df['date'].dt.date).size()

# print(count)
date
2019-09-12    5
2019-09-13    6
2019-09-14    6
2019-09-15    6
2019-09-16    6
2019-09-17    6
2019-09-18    6
2019-09-19    6
2019-09-20    3
dtype: int64
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • So you suggest now to count the length of "count" ? Otherwise I do not understand how your solution helps to solve my problem. the amount of days in "count" is 84 in my dataframe. Can't be right – Mark T Jun 20 '20 at 08:50
  • Its really hard to guess what you are trying to do, unless you provide the desired output? Are you looking for `count.sum()`? – Shubham Sharma Jun 20 '20 at 08:52
  • The dataframe has at the moment a date range from 2019-09-13 to 2020-06-17. 5 minute bin size. Only weekdays Mo-Fr (no sat/sun). 24.12, 25.12, 31.01 excluded. More days will be excluded based on my criteria. As desired output I need simply the number of unique days in this dataframe. `Output: 278` – Mark T Jun 20 '20 at 09:01
  • At the moment I need to know if 195(python) or 278(excel) is correct. One of it seems to count the weekend days? – Mark T Jun 20 '20 at 09:03
  • @MarkT Does your dataframe column currently contains weekdays and `24.12, 25.12, 31.01 `? – Shubham Sharma Jun 20 '20 at 09:40
  • No, I dropped them – Mark T Jun 20 '20 at 09:42
  • Well if you have dropped them, `df['Date'].dt.date.nunique()` should work? – Shubham Sharma Jun 20 '20 at 09:45
  • 1
    yes, I think also my first trials in the question gave me the right results ( same as `df['Date'].dt.date.nunique()` ) and the result in excel is wrong – Mark T Jun 20 '20 at 09:48
1

I'm not completely sure what you want to do here. Do you want to count the number of unique days (Monday/Tuesday/...), monthly dates (1-31 ish), yearly dates (1-365), or unique dates (unique days since the dawn of time)?

From a pandas series, you can use {series}.value_counts() to get the number of entries for each unique value, or simply get all unique values with {series}.unique()

import pandas as pd
df = pd.DataFrame(pd.DatetimeIndex(['2016-10-08 07:34:13', '2015-11-15 06:12:48',
               '2015-01-24 10:11:04', '2015-03-26 16:23:53',
               '2017-04-01 00:38:21', '2015-03-19 03:47:54',
               '2015-12-30 07:32:32', '2015-11-10 20:39:36',
               '2015-06-24 05:48:09', '2015-03-19 16:05:19'],
              dtype='datetime64[ns]', freq=None), columns = ["date"])

days (Monday/Tuesday/...):

df.date.dt.dayofweek.value_counts()

monthly dates (1-31 ish)

df.date.dt.day.value_counts()

yearly dates (1-365)

df.date.dt.dayofyear.value_counts()

unique dates (unique days since the dawn of time)

df.date.dt.date.value_counts()

To get the number of unique entries from any of the above, simply add .shape[0]

1

In order to calculate the total number of unique dates in the given time series data example we can use:

print(len(pd.to_datetime(df['Date']).dt.date.unique()))
Josef
  • 2,869
  • 2
  • 22
  • 23
Pranav Shetty
  • 36
  • 1
  • 3
0
import pandas as pd
df = pd.DataFrame({'Date': ['2019-09-13 06:00:00',
                            '2019-09-13 06:05:00',
                            '2019-09-13 06:10:00',
                            '2019-09-13 06:15:00',
                            '2019-09-13 06:20:00']
                   },
                   dtype = 'datetime64[ns]'
                  )

df = df.set_index('Date')
_count_of_days = df.resample('D').first().shape[0]
print(_count_of_days)
  • Can you explain how this works? This is the answer I need but I'm not sure what the second to last line `_count_of_days = df.resample('D').first().shape[0]` is actually doing. – Ramy May 07 '22 at 17:00