110

I'm trying to generate a date range of monthly data where the day is always at the beginning of the month:

pd.date_range(start='1/1/1980', end='11/1/1991', freq='M')

This generates 1/31/1980, 2/29/1980, and so on. Instead, I just want 1/1/1980, 2/1/1980,...

I've seen other question ask about generating data that is always on a specific day of the month, with answers saying it wasn't possible, but beginning of month surely must be possible!

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
Bunny_Ross
  • 1,378
  • 2
  • 10
  • 9

2 Answers2

203

You can do this by changing the freq argument from 'M' to 'MS':

d = pandas.date_range(start='1/1/1980', end='11/1/1990', freq='MS')    
print(d)

This should now print:

DatetimeIndex(['1980-01-01', '1980-02-01', '1980-03-01', '1980-04-01',
               '1980-05-01', '1980-06-01', '1980-07-01', '1980-08-01',
               '1980-09-01', '1980-10-01', 
               ...
               '1990-02-01', '1990-03-01', '1990-04-01', '1990-05-01',
               '1990-06-01', '1990-07-01', '1990-08-01', '1990-09-01',
               '1990-10-01', '1990-11-01'],
              dtype='datetime64[ns]', length=131, freq='MS', tz=None)

Look into the offset aliases part of the documentation. There it states that 'M' is for the end of the month (month end frequency) while 'MS' for the beginning (month start frequency).

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
13

It is worth noting that pandas.date_range() only includes dates within the defined interval, which may not be expected :

start = "2020-03-08"
end = "2021-03-08"
pd.date_range(start, end, freq='MS')

results in

DatetimeIndex(['2020-04-01', '2020-05-01', '2020-06-01', '2020-07-01',
           '2020-08-01', '2020-09-01', '2020-10-01', '2020-11-01',
           '2020-12-01', '2021-01-01', '2021-02-01', '2021-03-01'],
          dtype='datetime64[ns]', freq='MS')

For MS, a workaround to include the first day of the opening month is to work only with the year and month of the start date :

pd.date_range(start[:7], end, freq='MS')

will then give

DatetimeIndex(['2020-03-01', '2020-04-01', '2020-05-01', '2020-06-01',
           '2020-07-01', '2020-08-01', '2020-09-01', '2020-10-01',
           '2020-11-01', '2020-12-01', '2021-01-01', '2021-02-01',
           '2021-03-01'],
          dtype='datetime64[ns]', freq='MS')

If you wish to keep the same starting day for each month, you can then add the offset with pd.DateOffset() :

pd.date_range(start[:7], end, freq='MS') + pd.DateOffset(days=7)

will give

DatetimeIndex(['2020-03-08', '2020-04-08', '2020-05-08', '2020-06-08',
           '2020-07-08', '2020-08-08', '2020-09-08', '2020-10-08',
           '2020-11-08', '2020-12-08', '2021-01-08', '2021-02-08',
           '2021-03-08'],
          dtype='datetime64[ns]', freq=None)

As mentioned in comments, note that trouble may come with this workaround for offsets higher or equals to 28.

Skippy le Grand Gourou
  • 6,976
  • 4
  • 60
  • 76
  • 1
    This causes trouble if the desired date is on 29th and feb has 28 days. – anishtain4 Sep 26 '21 at 18:28
  • @anishtain4 Right (for the last part, adding an offset). Actually trouble with this method comes for any offset equals to or higher than 28. – Skippy le Grand Gourou Sep 27 '21 at 13:08
  • `'MS'` for `date_range` does *not* "makes the range start at the beginning of the next month". But it does include only date points *inside* the range defined by `start` and `end`. If the `start` is a date point in the freq supplied, `date_range` *will* return that date point in the result. – pablete Dec 06 '22 at 20:57
  • 1
    @pablete This was poorly worded indeed. Answer updated, thanks for the comment. – Skippy le Grand Gourou Dec 09 '22 at 08:39