1

Say I have a data frame with two columns

Start      End
1/1/2015  1/5/2015
1/10/2015 1/12/2015

What would be the best method to get the dates in between the start and end (the actual dates, not the number of days in between) including the start and end dates.

For example I would get 1/1/2015, 1/2/2015, ..., 1/5/2015.

The reason I want to do this is to find out how many weekend are between two dates.

Here is the example data frame plus the code to parse dates quickly.

def lookup(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    """
    dates = {date:pd.to_datetime(date) for date in s.unique()}
    return s.map(dates)

df = pd.DataFrame({"Start": ["1/1/2015", "1/10/2015"], "End": ["1/5/2015", "1/12/2015"]})
df["Start"] = lookup(df["Start"])
df["End"] = lookup(df["End"])

If someone knows a better way to do this please let me know as I think there is a better way to find the number of weekends between two dates.

I have tried to understand the pd.date_range() function and have tried applying it like this.

df["between"] = pd.date_range(df["Start"], df["End"])

But get an error saying it cannot convert the input, I know I am using this function incorrectly. I think I need to use apply but am unsure how to use it with this function.

Any help is appreciated. Please let me know if you need any more information.

Thank you for your time.

mrsquid
  • 605
  • 2
  • 9
  • 24
  • possible duplicated with https://stackoverflow.com/questions/13019719/get-business-days-between-start-and-end-date-using-pandas – Hernan Acosta Aug 23 '19 at 19:51
  • Instead of enumerating dates (which might be inefficient if the dates are far apart), you might want to [count the number of weeks between two dates](https://stackoverflow.com/a/14191915/190597). – unutbu Aug 23 '19 at 20:07

1 Answers1

3

you can leverage the built-in dataoffsets that pandas uses. bdate_range() is going to be your friend here

# create a dataframe of dates
df = pd.DataFrame({'Dates': pd.date_range("2015-01-01", "2019-08-01")})

# create a series of business days
busines_dates = pd.bdate_range("2015-01-01", "2019-08-30")

# find where the two do not intersect
df.loc[~df['Dates'].isin(busines_dates)]

I feel like you may want this as a funciton based on your question. Here is a basic one:

def weekends(start, end):
    df = pd.DataFrame({'Dates': pd.date_range(start, end)})
    busines_dates = pd.bdate_range(start, end)
    answer = df.loc[~df['Dates'].isin(busines_dates)]
    print("There are", answer.shape[0], 'weekends between', start, 'and', end)
    return answer

weekends("2015-01-01", "2019-01-01")


There are 418 weekends between 2015-01-01 and 2019-01-01
          Dates
2    2015-01-03
3    2015-01-04
9    2015-01-10
10   2015-01-11
16   2015-01-17
...         ...
1445 2018-12-16
1451 2018-12-22
1452 2018-12-23
1458 2018-12-29
1459 2018-12-30
MattR
  • 4,887
  • 9
  • 40
  • 67
  • Wow this is excellent! Thank you, this is above and beyond what I expected, this function will be extremely useful! Thanks! – mrsquid Aug 23 '19 at 22:14