Enriching one dataframe with data from another based on conditions

Question

I need some help to do some complex lookups and enrichment. I need this working in Python 3.5

Here is the scenario. I have 2 dataframes: dfMaster and dfLookup

dfMaster has columns fromDate, toDate, fromLocation, toLocation, LookupKey (This dataframe will get additional rows every day) The data in dfMaster has 2 rows:

1-May-2018, 3-May-2018, Amsterdam, Rotterdam, NaN

2-May-2018, 5-May-2018, Rotterdam, Amsterdam, NaN

dfLookup has columns lookupDate, lookupFromLocation, lookupToLocation, someKey (This is a fairly static dataframe) The data in dfLookup has 2 rows:

3-May-2018, Rotterdam, Amsterdam, Key1

5-May-2018, Rotterdam, Amsterdam, Key1

I want to do the following matching on dfMaster:

I want to pick dfLookup.lookupDate and check its between dfMaster.fromDate and dfMaster.toDate
Next, I want to check the lookupFromLocation matches fromLocation
Finally, I want to check the lookupToLocation matches toLocation

If all of this matches, I want to pick dfLookup.Key1 and put it into dfMaster.LookupKey

The final output should look like this: 1-May-2018, 3-May-2018, Amsterdam, Rotterdam, NaN

2-May-2018, 5-May-2018, Rotterdam, Amsterdam, Key1

Not finding an easy way to do this. I also want the final output to not be only filtered by what I enrich, but also by those rows that are not enriched

Thanks

@EvgenyPogrebnyak I have already tried many things. None of them are useful. I create a new filteredMasterDf using the 3 parameters of lookupDate, lookupFromLocation, lookupToLocation. I can then ofcourse update that row. But I dont want that. I want to enrich the entire dfMaster — Manish, Jun 15 '18 at 13:27
why not provide some reproducible inputs to your problem and the code you had. that saves time greatly on answering https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Evgeny, Jun 15 '18 at 13:32
I am trying to rig up an example. Can you please remove your downvote in the meanwhile? — Manish, Jun 15 '18 at 13:42
gladly, but the downvote is locked unless the question is edited — Evgeny, Jun 15 '18 at 13:45

score 1 · Accepted Answer · answered Jun 15 '18 at 14:00

Data Cleaning...

import pandas as pd
from pandas import Timestamp
from numpy import nan
import numpy as np

# Data cleaning
dfLookup = pd.DataFrame({'lookupDate': {0: Timestamp('2018-03-05 00:00:00'),
  1: Timestamp('2018-05-05 00:00:00')},
 'lookupFromLocation': {0: 'Rotterdam', 1: 'Rotterdam'},
 'lookupToLocation': {0: 'Amsterdam', 1: 'Amsterdam'},
 'someKey': {0: 'Key1', 1: 'Key1'}}
)

dfMaster = pd.DataFrame({'fromDate': {0: Timestamp('2018-01-05 00:00:00'),
  1: Timestamp('2018-02-05 00:00:00')},
 'toDate': {0: Timestamp('2018-05-03 00:00:00'),
  1: Timestamp('2018-05-05 00:00:00')},
 'fromLocation': {0: 'Amsterdam', 1: 'Rotterdam'},
 'toLocation': {0: 'Rotterdam', 1: 'Amsterdam'},
 'LookupKey': {0: nan, 1: nan}}
)

# More data cleaning
dfLookup.lookupDate = pd.to_datetime(dfLookup.lookupDate)
dfMaster.fromDate = pd.to_datetime(dfMaster.fromDate)
dfMaster.toDate = pd.to_datetime(dfMaster.toDate)

pd.merge using the city columns in both dataframes

df = pd.merge(dfMaster, dfLookup, how='left', right_on=['lookupFromLocation', 'lookupToLocation'], left_on=['fromLocation', 'toLocation'])

# Check the date is in between the desired dates
df.LookupKey = df.LookupKey.mask(df.lookupDate.between(df.fromDate, df.toDate), df.someKey)

# Remove duplicates
df = df[dfMaster.columns].drop_duplicates()

Out[]:
    fromDate     toDate fromLocation toLocation LookupKey
0 2018-01-05 2018-05-03    Amsterdam  Rotterdam       NaN
1 2018-02-05 2018-05-05    Rotterdam  Amsterdam      Key1

Hi Dillon, Thanks for the code snippet. This works fine. I have further complications. My lookup table has multiple records which could all match with Master. So the merged table has too many extra rows. I need to find a way to get my lookup to go down to 1-to-1 relation with master. — Manish, Jun 18 '18 at 14:54

Enriching one dataframe with data from another based on conditions

1 Answers1

Data Cleaning...