2

after all cleaning data , i got to this point. can anyone help me please to finish it off?

I have 2 dataframe that after all cleaning and merging need to be merge to each other at the end !

df1:
PriceDate   tenor   ccy
3/07/2019   6   USD_AED

df2:
PriceDate   Term_By_Day     Ccy   rate
1/07/2019   7            USD_AED    2.34
2/07/2019   7            USD_AED    3.45
4/07/2019   7            USD_AED    4.89

so my goal is to merge this 2 data-frame based on exact "Ccy" and nearest "pricedate" and nearest "Tenor with Term_by_day".

so the ideal output will be :   
PriceDate   tenor    ccy       rate
3/07/2019   6     USD_AED      whatever it should be 

I believe i have to use interpolation with method=nearest ????? but i'm so lost now !

Can you please give me any tip if i'm on the right track and if interpolation is the best way to do it ?

cheers,

Z

Z.A
  • 55
  • 4

1 Answers1

0

You can find the nearest PriceDate in df2 for all rows in df1 like this:

nearest_rates = []
for row in df1.itertuples(index=False):
    nearest_index = df2.set_index('PriceDate').index.get_loc(row.PriceDate, method='nearest')
    nearest_rate  = df2.iloc[nearest_index]['rate']
    nearest_rates.append(nearest_rate)
df1['rate'] = nearest_rates

For your example, this will find nearest_index = 2 (due to the nearest method) with nearest_rate = 4.89 and append this to df1.

It is not quite clear to me if you need a combined dataframe or if you want to include the nearest rates from df2 in df1. For actually joining/merging the two sources there are multiple methods and you will have to define your specific conditions (column names e.g. Ccy vs. ccy; handling of possible duplicates etc.).

Gerd
  • 2,568
  • 1
  • 7
  • 20
  • many thanks for taking your time to explain this ! what you have mentioned is exactly what i'm looking for and i'll test it now however the condition is not only price date . as the data is huge so i would like it to be group by nearest "price date" , exact "curr", then nearest "term by day" to be compared to "tenor". so after grouping by these three columns, then i want them to be merged ! do you think we can add other columns into this function as well or group them by, then apply the function ? – Z.A Mar 29 '20 at 22:39
  • I am not sure if grouping is the approach that you need here. By grouping, you basically rearrange your data (e.g. if you want to filter all rows with a given `PriceDate` and/or `tenor`). You however want to find the "nearest" data point by multiple criteria - so I think you will probably have to define some function/measure for that "nearness", e.g. is a `PriceDate` difference of three days and a `tenor` difference of five nearer than a `PriceDate`difference of four days and a `tenor` difference of four? – Gerd Mar 30 '20 at 13:54
  • ok. this function didn't work but I have used merge.as_of to merge them however it gave me many NA values ! now i need to fill in NA values based on nearest non-NA . do you mind look at my other question if you have any idea about it ? – Z.A Mar 31 '20 at 01:40