How to include end date in pandas date_range method?

Question

From pd.date_range('2016-01', '2016-05', freq='M', ).strftime('%Y-%m'), the last month is 2016-04, but I was expecting it to be 2016-05. It seems to me this function is behaving like the range method, where the end parameter is not included in the returning array.

Is there a way to get the end month included in the returning array, without processing the string for the end month?

`date_range()` parameters still seems to be a bit tricky https://github.com/pandas-dev/pandas/issues/16354 — arturomp, May 30 '18 at 20:18

score 16 · Accepted Answer · answered Jun 17 '16 at 21:24

16

A way to do it without messing with figuring out month ends yourself.

pd.date_range(*(pd.to_datetime(['2016-01', '2016-05']) + pd.offsets.MonthEnd()), freq='M')

DatetimeIndex(['2016-01-31', '2016-02-29', '2016-03-31', '2016-04-30',
           '2016-05-31'],
          dtype='datetime64[ns]', freq='M')

answered Jun 17 '16 at 21:24

piRSquared

285,575
57
475
624

With this solution I don't need to mess with days and (n+1) month. – srodriguex Jun 17 '16 at 23:04

root · Answer 2 · 2016-06-17T22:08:03.373

13

You can use .union to add the next logical value after initializing the date_range. It should work as written for any frequency:

d = pd.date_range('2016-01', '2016-05', freq='M')
d = d.union([d[-1] + 1]).strftime('%Y-%m')

Alternatively, you can use period_range instead of date_range. Depending on what you intend to do, this might not be the right thing to use, but it satisfies your question:

pd.period_range('2016-01', '2016-05', freq='M').strftime('%Y-%m')

In either case, the resulting output is as expected:

['2016-01' '2016-02' '2016-03' '2016-04' '2016-05']

edited Jun 17 '16 at 22:08

answered Jun 17 '16 at 21:29

root

32,715
6
74
87

4

Thanks for period_range, that's what I was looking for. – Tickon Mar 27 '17 at 11:03
`.union` is also solution to a similar problem: you want monthly intervals, including your endpoints, but your start and end do not fall on the beginning/end of the month, eg `start=pd.to_datetime('2016-01-05')`, `finish=pd.to_datetime('2016-05-13')`, `d=date_range(start, finish,freq='M').union([start, finish])`. It even sorts the index for you. – Jake Stevens-Haas Jun 19 '19 at 17:51
1

getting this error now: TypeError: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq` – Rafael Nov 09 '22 at 00:18
1

@Rafael got that error too, easy (and better readable IMO) workaround is `pd.date_range('2016-01', '2016-05', freq='M').strftime('%Y-%m').union(['2016-05'])` – shiftyscales Dec 14 '22 at 16:25

score 12 · Answer 3 · answered Jul 26 '17 at 10:48

12

For the later crowd. You can also try to use the Month-Start frequency.

>>> pd.date_range('2016-01', '2016-05', freq='MS', format = "%Y-%m" )
DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
               '2016-05-01'],
              dtype='datetime64[ns]', freq='MS')

answered Jul 26 '17 at 10:48

zwep

1,207
12
26

works great with pandas 1.3.4 except the format argument which needed to be replaced by `.strftime('%Y-%m')` – shiftyscales Dec 14 '22 at 16:25

score 1 · Answer 4 · answered Jun 17 '16 at 21:14

1

Include the day when specifying the dates in date_range call

pd.date_range('2016-01-31', '2016-05-31', freq='M', ).strftime('%Y-%m')

array(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05'], 
      dtype='|S7')

answered Jun 17 '16 at 21:14

piRSquared

285,575
57
475
624

If you add a day manually, then you might as well just add one more month dont you think – ℕʘʘḆḽḘ Jun 17 '16 at 21:16
No. I can imagine a need and reason for preferring either depending on the situation. – piRSquared Jun 17 '16 at 21:20

Brian Hardin · Answer 5 · 2020-03-10T16:22:26.303

I had a similar problem when using datetime objects in dataframe. I would set the boundaries through .min() and .max() functions and then fill in missing dates using the pd.date_range function. Unfortunately the returned list/df was missing the maximum value.

I found two work arounds for this:

1) Add "closed = None" parameter in the pd.date_range function. This worked in the example below; however, it didn't work for me when working only with dataframes (no idea why).

2) If option #1 doesn't work then you can add one extra unit (in this case a day) using the datetime.timedelta() function. In the case below it over indexed by a day but it can work for you if the date_range function isn't giving you the full range.

import pandas as pd
import datetime as dt 

#List of dates as strings
time_series = ['2020-01-01', '2020-01-03', '2020-01-5', '2020-01-6', '2020-01-7']

#Creates dataframe with time data that is converted to datetime object 
raw_data_df = pd.DataFrame(pd.to_datetime(time_series), columns = ['Raw_Time_Series'])

#Creates an indexed_time list that includes missing dates and the full time range

#Option No. 1 is to use the closed = None parameter choice. 
indexed_time = pd.date_range(start = raw_data_df.Raw_Time_Series.min(),end = raw_data_df.Raw_Time_Series.max(),freq='D',closed= None)
print('indexed_time option #! = ', indexed_time)

#Option No. 2 if the function allows you to extend the time by one unit (in this case day) 
#by using the datetime.timedelta function to get what you need. 
indexed_time = pd.date_range(start = raw_data_df.Raw_Time_Series.min(),end = raw_data_df.Raw_Time_Series.max()+dt.timedelta(days=1),freq='D')
print('indexed_time option #2 = ', indexed_time)

#In this case you over index by an extra day because the date_range function works properly
#However, if the "closed = none" parameters doesn't extend through the full range then this is a good work around

ℕʘʘḆḽḘ · Answer 6 · 2016-06-17T21:39:35.823

0

I dont think so. You need to add the (n+1) boundary

   pd.date_range('2016-01', '2016-06', freq='M' ).strftime('%Y-%m')

The start and end dates are strictly inclusive. So it will not generate any dates outside of those dates if specified. http://pandas.pydata.org/pandas-docs/stable/timeseries.html

Either way, you have to manually add some information. I believe adding just one more month is not a lot of work.

edited Jun 17 '16 at 21:39

answered Jun 17 '16 at 21:11

ℕʘʘḆḽḘ

18,566
34
128
235

This quote from the docs holds true when `freq='D'`, when is monthly it doesn't hold for the end date tough. – srodriguex Jun 17 '16 at 22:54
ok. it does not change the fact that you need to add your boundary :) – ℕʘʘḆḽḘ Jun 17 '16 at 22:57

jpm · Answer 7 · 2019-02-20T18:35:20.467

The explanation for this issue is that the function pd.to_datetime() converts a '%Y-%m' date string by default to the first of the month datetime, or '%Y-%m-01':

>>> pd.to_datetime('2016-05')
Timestamp('2016-05-01 00:00:00')
>>> pd.date_range('2016-01', '2016-02')
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
               '2016-01-09', '2016-01-10', '2016-01-11', '2016-01-12',
               '2016-01-13', '2016-01-14', '2016-01-15', '2016-01-16',
               '2016-01-17', '2016-01-18', '2016-01-19', '2016-01-20',
               '2016-01-21', '2016-01-22', '2016-01-23', '2016-01-24',
               '2016-01-25', '2016-01-26', '2016-01-27', '2016-01-28',
               '2016-01-29', '2016-01-30', '2016-01-31', '2016-02-01'],
              dtype='datetime64[ns]', freq='D')

Then everything follows from that. Specifying freq='M' includes month ends between 2016-01-01 and 2016-05-01, which is the list you receive and excludes 2016-05-31. But specifying month starts 'MS' like the second answer provides, includes 2016-05-01 as it falls within the range. pd.date_range() default behavior isn't like the range method since ends are included. From the docs:

closed controls whether to include start and end that are on the boundary. The default includes boundary points on either end.

How to include end date in pandas date_range method?

7 Answers7

Linked