2

I have Dataframe 1:

Hotel   DateFrom    DateTo      Room
BBB     2019-10-29  2020-03-27  DHS
BBB     2020-03-28  2020-10-30  DHS
BBB     2020-10-31  2021-03-29  DHS
BBB     2021-03-30  2099-01-01  DHS

And Dataframe 2:

Hotel   DateFrom    DateTo      Room    Food
BBB     2020-03-01  2020-04-24  DHS     A
BBB     2020-04-25  2020-05-03  DHS     B
BBB     2020-05-04  2020-05-31  DHS     C
BBB     2020-06-01  2020-06-22  DHS     D
BBB     2020-06-23  2020-08-26  DHS     E
BBB     2020-08-27  2020-11-30  DHS     F

I need to check if each row in df1 and if df1_DateFrom is between df2_DateFrom and df2_DateTo. Then i need to get that food code from df2 to new column in df1 or as new df3 shown below.

The result would look like this:

df3:

    Hotel   DateFrom    DateTo      Room  Food
    BBB     2019-10-29  2020-03-27  DHS   
    BBB     2020-03-28  2020-10-30  DHS   A
    BBB     2020-10-31  2021-03-29  DHS   F 
    BBB     2021-03-30  2099-01-01  DHS

I would really appreciate any help with this. I am kinda new on Pandas and still learning and i must say that it is bit complicated for me.

JerryC
  • 67
  • 1
  • 9
  • Welcome to stackoverflow! You are possibly trying to achieve something similar to what was asked in this question https://stackoverflow.com/questions/31328014/merging-dataframes-based-on-date-range – Jarno May 26 '20 at 15:07

2 Answers2

3

You can do a cross merge and query:

# recommend dealing with datetime type:
df1['DateFrom'],df1['DateTo'] = pd.to_datetime(df1['DateFrom']),pd.to_datetime(df1['DateTo'])
df2['DateFrom'],df2['DateTo'] = pd.to_datetime(df2['DateFrom']),pd.to_datetime(df2['DateTo'])

new_df = (df1.reset_index().merge(df2, on=['Hotel','Room'],
                                  how='left', suffixes=['','_'])
             .query('DateFrom_ <= DateFrom <= DateTo_')
         )
df1['Food'] = new_df.set_index('index')['Food']

Output:

  Hotel   DateFrom     DateTo Room Food
0   BBB 2019-10-29 2020-03-27  DHS  NaN
1   BBB 2020-03-28 2020-10-30  DHS    A
2   BBB 2020-10-31 2021-03-29  DHS    F
3   BBB 2021-03-30 2099-01-01  DHS  NaN
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thank You, worked perfectly - now i just need to figure out what exactly happens there :) – JerryC May 27 '20 at 07:35
  • So I was reading about merging and I think i understand it. But I have one more question. If i merge two dataframes: (**df1 has columns [hotel, room, guestName]**) and (**df2 has columns [hotel, room, dateFrom, dateTo]**). When I make this: RESULT = df1.merge(df2, on=['hotel', 'room']) - in result I will get frame that has columns: **[hotel, room, guestName, dateFrom, dateTo]**. Is there a way to get in result matching rows from both dataframes but columns only from df1 (**so that RESULT is [hotel, room, guestName]**). I am asking because I would like to skip another filtering after merge. – JerryC May 29 '20 at 06:00
0

Much less elegant then Quang Hoang's answer, but the solution using np.piecewise would look like this. See also https://stackoverflow.com/a/30630905/4873972

import pandas as pd
import numpy as np
from io import StringIO

# Creating the dataframes.
df1 = pd.read_table(StringIO("""
Hotel   DateFrom    DateTo      Room
BBB     2019-10-29  2020-03-27  DHS
BBB     2020-03-28  2020-10-30  DHS
BBB     2020-10-31  2021-03-29  DHS
BBB     2021-03-30  2099-01-01  DHS
"""), sep=r"\s+").convert_dtypes()

df1["DateFrom"] = pd.to_datetime(df1["DateFrom"])
df1["DateTo"] = pd.to_datetime(df1["DateTo"])

df2 = pd.read_table(StringIO("""
Hotel   DateFrom    DateTo      Room    Food
BBB     2020-03-01  2020-04-24  DHS     A
BBB     2020-04-25  2020-05-03  DHS     B
BBB     2020-05-04  2020-05-31  DHS     C
BBB     2020-06-01  2020-06-22  DHS     D
BBB     2020-06-23  2020-08-26  DHS     E
BBB     2020-08-27  2020-11-30  DHS     F
"""), sep=r"\s+").convert_dtypes()

df2["DateFrom"] = pd.to_datetime(df2["DateFrom"])
df2["DateTo"] = pd.to_datetime(df2["DateTo"])
# Avoid zero index for merging later on.
df2["id"] = np.arange(1, len(df2) +1 )

# Find matching indexes.
df1["df2_id"] = np.piecewise(
    np.zeros(len(df1)), 
    [(df1["DateFrom"].values >= start_date) & (df1["DateFrom"].values <= end_date) for start_date, end_date in zip(df2["DateFrom"].values, df2["DateTo"].values)], 
    df2.index.values
)

# Merge on matching indexes.
df1.merge(df2["Food"], left_on="df2_id", right_index=True, how="left")

Output:

  Hotel   DateFrom     DateTo Room Food
0   BBB 2019-10-29 2020-03-27  DHS  NaN
1   BBB 2020-03-28 2020-10-30  DHS    A
2   BBB 2020-10-31 2021-03-29  DHS    F
3   BBB 2021-03-30 2099-01-01  DHS  NaN
Jarno
  • 6,243
  • 3
  • 42
  • 57