1

I have first pandas dataframe like following

 trans_df
 code     price       date           time        product
 12023    71.23       01-01-2018     06:23:00    MS  
 12023    61          01-01-2018     07:56:00    HS
 12023    71.23       01-01-2018     08:34:00    MS
 12023    71.30       01-01-2018     06:03:00    MS  
 12023    61          01-01-2018     11:43:00    HS
 12023    71.23       01-01-2018     10:11:00    MS
 12023    71.23       01-01-2018     04:23:00    MS
 12023    72.23       02-01-2018     10:11:00    MS
 12023    72.23       02-01-2018     04:23:00    MS

Now, I have master price dataframe from where I am checking whether the price set in trans_df is correct or not basis the transaction date and time is between effective_date_from and effective_date_to in master_price for that particular product

master_price
code  price  effective_date_from    effective_date_to  time_from   time_to  product
12023 71.23  01-01-2018             02-01-2018         06:00:00    05:59:00  MS     
12023 61     01-01-2018             02-01-2018         06:00:00    05:59:00  HS 
12023 72.23  02-01-2018             03-01-2018         06:00:00    05:59:00  MS  

Desired dataframe would be

 trans_df

 code     price       date           time        product  flag      actual_price
 12023    71.23       01-01-2018     06:23:00    MS       match     71.23    
 12023    61          01-01-2018     07:56:00    HS       match     61
 12023    71.23       01-01-2018     08:34:00    MS       match     71.23
 12023    71.30       01-01-2018     06:03:00    MS       mismatch  71.23 
 12023    61          01-01-2018     11:43:00    HS       match     61
 12023    71.23       01-01-2018     10:11:00    MS       match     71.23
 12023    71.23       01-01-2018     04:23:00    MS       nan       nan
 12023    72.23       02-01-2018     10:11:00    MS       match     72.23
 12023    72.23       02-01-2018     04:23:00    MS       match     72.23
Neil
  • 7,937
  • 22
  • 87
  • 145

1 Answers1

3

Use:

#convert dates with times to datetimes
master_price['effective_date_from'] = (pd.to_datetime(master_price['effective_date_from'], 
                                       format='%d-%m-%Y') + 
                                      pd.to_timedelta(master_price['time_from']))
master_price['effective_date_to'] = (pd.to_datetime(master_price['effective_date_to'], 
                                     format='%d-%m-%Y') + 
                                     pd.to_timedelta(master_price['time_to']))
trans_df['date'] = (pd.to_datetime(trans_df['date'], format='%d-%m-%Y') +
                    pd.to_timedelta(trans_df['time']))

#join together and filter between 
df = trans_df.merge(master_price, on=['code','product'], how='left')
df = df[df.date.between(df.effective_date_from, df.effective_date_to)]

#add only filterd rows to original
df = trans_df.merge(df, on=['code','product','date','time'], how='left')
cols = ['effective_date_from', 'effective_date_to', 'time_to','time_from','price_x']
df = df.drop(cols, axis=1)
#first test missing values then match.mismatch
df['flag'] = np.select([df['price_y'].isnull(), 
                        df['price_y'] == df['price']], 
                       [np.nan, 'match'], default='mismatch')
df = df.rename(columns={'price_y':'actual_price'})
print (df)
    code  price                date      time product  actual_price      flag
0  12023  71.23 2018-01-01 06:23:00  06:23:00      MS         71.23     match
1  12023  61.00 2018-01-01 07:56:00  07:56:00      HS         61.00     match
2  12023  71.23 2018-01-01 08:34:00  08:34:00      MS         71.23     match
3  12023  71.30 2018-01-01 06:03:00  06:03:00      MS         71.23  mismatch
4  12023  61.00 2018-01-01 11:43:00  11:43:00      HS         61.00     match
5  12023  71.23 2018-01-01 10:11:00  10:11:00      MS         71.23     match
6  12023  71.23 2018-01-01 04:23:00  04:23:00      MS           NaN       nan
7  12023  72.23 2018-01-02 10:11:00  10:11:00      MS         72.23     match
8  12023  72.23 2018-01-02 04:23:00  04:23:00      MS         71.23  mismatch
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I have multiple dates in trans_df and master_price, so at the time of joining these two dataframes we will also have to use date? – Neil Dec 03 '18 at 07:48
  • @Neil - hmmm, need only one datetime column, one possible solution for do it is use `melt`. Unfortunately it create more rows, so solution should be memory consumed. Another solution should be create function, loop each row and match. This solution is slow :( – jezrael Dec 03 '18 at 07:50
  • updated my question. If we want to use if loop, then will it be too slow? – Neil Dec 03 '18 at 07:54
  • @Neil - it depends what is size of both DataFrames. – jezrael Dec 03 '18 at 07:54
  • 2-3 lacs records in transaction df and ~2k entries in master price df – Neil Dec 03 '18 at 07:55
  • @Neil - it is a lot of... Now thinking if not possible use this solution with modification like `df = df[df.date.between(df.effective_date_from, df.effective_date_to) | df.date.between(df.effective_date_from1, df.effective_date_to2)]` – jezrael Dec 03 '18 at 08:00
  • `df = trans_df.merge(master_price, on=['code','product'], how='left')` this remains same or do we have to add date as well? – Neil Dec 03 '18 at 08:07
  • @Neil - I think yes, because need to join columns by these 2 columns only – jezrael Dec 03 '18 at 09:10
  • If we want to use if loop which will loop over all the transaction and make the comparison how can we do it? – Neil Dec 05 '18 at 10:55
  • @Neil - hmmm, can you change data for this? Or create new question? Because not sure if understand each other for 100%. Thanks. – jezrael Dec 05 '18 at 11:09
  • sure I will do that. – Neil Dec 05 '18 at 11:40
  • @Neil - I read conversation under new question now, better is add link under some my answer, because I am not notify and some users should be angry... – jezrael Dec 05 '18 at 13:09
  • @Neil - working with your new question and is `master_price` correct? Because datetime ranges are overlaping, e.g. for first and second row fr same code and same product. – jezrael Dec 05 '18 at 13:24
  • there can be two price changes in the same day. – Neil Dec 05 '18 at 13:44
  • So it is really problem. You need change datetimes from and to to unique ranges and some fast non loop solution is not easy (I guess loop solution should be not easy too :( ) – jezrael Dec 05 '18 at 14:00
  • I see. anything else we can do about it? – Neil Dec 05 '18 at 14:13
  • @Neil - Main problem is change datetimes for non overlaping and then use this answer. – jezrael Dec 05 '18 at 14:20
  • considering unique price master data what would be the solution ? – Neil Dec 05 '18 at 18:35