-1

I need some help to do some complex lookups and enrichment. I need this working in Python 3.5

Here is the scenario. I have 2 dataframes: dfMaster and dfLookup

dfMaster has columns fromDate, toDate, fromLocation, toLocation, LookupKey (This dataframe will get additional rows every day) The data in dfMaster has 2 rows:

1-May-2018, 3-May-2018, Amsterdam, Rotterdam, NaN

2-May-2018, 5-May-2018, Rotterdam, Amsterdam, NaN

dfLookup has columns lookupDate, lookupFromLocation, lookupToLocation, someKey (This is a fairly static dataframe) The data in dfLookup has 2 rows:

3-May-2018, Rotterdam, Amsterdam, Key1

5-May-2018, Rotterdam, Amsterdam, Key1

I want to do the following matching on dfMaster:

  • I want to pick dfLookup.lookupDate and check its between dfMaster.fromDate and dfMaster.toDate
  • Next, I want to check the lookupFromLocation matches fromLocation
  • Finally, I want to check the lookupToLocation matches toLocation

If all of this matches, I want to pick dfLookup.Key1 and put it into dfMaster.LookupKey

The final output should look like this: 1-May-2018, 3-May-2018, Amsterdam, Rotterdam, NaN

2-May-2018, 5-May-2018, Rotterdam, Amsterdam, Key1

Not finding an easy way to do this. I also want the final output to not be only filtered by what I enrich, but also by those rows that are not enriched

Thanks

Manish
  • 23
  • 1
  • 9
  • @EvgenyPogrebnyak I have already tried many things. None of them are useful. I create a new filteredMasterDf using the 3 parameters of lookupDate, lookupFromLocation, lookupToLocation. I can then ofcourse update that row. But I dont want that. I want to enrich the entire dfMaster – Manish Jun 15 '18 at 13:27
  • why not provide some reproducible inputs to your problem and the code you had. that saves time greatly on answering https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Evgeny Jun 15 '18 at 13:32
  • I am trying to rig up an example. Can you please remove your downvote in the meanwhile? – Manish Jun 15 '18 at 13:42
  • 1
    gladly, but the downvote is locked unless the question is edited – Evgeny Jun 15 '18 at 13:45

1 Answers1

1

Data Cleaning...

import pandas as pd
from pandas import Timestamp
from numpy import nan
import numpy as np

# Data cleaning
dfLookup = pd.DataFrame({'lookupDate': {0: Timestamp('2018-03-05 00:00:00'),
  1: Timestamp('2018-05-05 00:00:00')},
 'lookupFromLocation': {0: 'Rotterdam', 1: 'Rotterdam'},
 'lookupToLocation': {0: 'Amsterdam', 1: 'Amsterdam'},
 'someKey': {0: 'Key1', 1: 'Key1'}}
)

dfMaster = pd.DataFrame({'fromDate': {0: Timestamp('2018-01-05 00:00:00'),
  1: Timestamp('2018-02-05 00:00:00')},
 'toDate': {0: Timestamp('2018-05-03 00:00:00'),
  1: Timestamp('2018-05-05 00:00:00')},
 'fromLocation': {0: 'Amsterdam', 1: 'Rotterdam'},
 'toLocation': {0: 'Rotterdam', 1: 'Amsterdam'},
 'LookupKey': {0: nan, 1: nan}}
)

# More data cleaning
dfLookup.lookupDate = pd.to_datetime(dfLookup.lookupDate)
dfMaster.fromDate = pd.to_datetime(dfMaster.fromDate)
dfMaster.toDate = pd.to_datetime(dfMaster.toDate)

pd.merge using the city columns in both dataframes

df = pd.merge(dfMaster, dfLookup, how='left', right_on=['lookupFromLocation', 'lookupToLocation'], left_on=['fromLocation', 'toLocation'])

# Check the date is in between the desired dates
df.LookupKey = df.LookupKey.mask(df.lookupDate.between(df.fromDate, df.toDate), df.someKey)

# Remove duplicates
df = df[dfMaster.columns].drop_duplicates()

Out[]:
    fromDate     toDate fromLocation toLocation LookupKey
0 2018-01-05 2018-05-03    Amsterdam  Rotterdam       NaN
1 2018-02-05 2018-05-05    Rotterdam  Amsterdam      Key1
Dillon
  • 997
  • 4
  • 13
  • Hi Dillon, Thanks for the code snippet. This works fine. I have further complications. My lookup table has multiple records which could all match with Master. So the merged table has too many extra rows. I need to find a way to get my lookup to go down to 1-to-1 relation with master. – Manish Jun 18 '18 at 14:54