1

I am working with two csv files and imported as dataframe. Let's say df1 and df2 as in picture, df1 and df2 which are of different length. df1 has 50000 rows and df2 has 20000 rows.

df1

df2

I want to compare (iterate through rows) the 'time' of df2 with df1, find the difference in time and return the values of all column corresponding to similar row.

For example, 66 (of 'time' in df1) is nearest to 67 (of 'time' in df2), So I would like to return the contents to df1 (15'vel' and 25'yaw') to df2 and save as a new csv

Community
  • 1
  • 1
ankushbraj
  • 73
  • 1
  • 11
  • 1
    Welcome to SO. Please provide a **[mcve]**. Also see: **[How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)**. – jpp Apr 20 '18 at 11:26
  • Thanks for the edit suggestion @jpp – ankushbraj Apr 20 '18 at 11:35
  • Please take the time to read those linked pages. Images are *not* good for providing a [good pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – jpp Apr 20 '18 at 11:36

2 Answers2

0

Create cartesian product and then do filtering -

df1 = pd.DataFrame({'time': [11, 22, 33,44, 55, 66,77,88,99], 'vel':[10, 11,12,13,14,15,16,17,18],
                   'yaw' : [20, 21, 22,23,24,25,26,27,28]})
df2 = pd.DataFrame({'time' : [67, 75, 87, 99]})

df1['key'] = 1
df2['key'] = 1
df1.rename(index=str, columns ={'time' : 'time_x'}, inplace=True)

df = df2.merge(df1, on='key', how ='left').reset_index()
df['diff'] = df.apply(lambda x: abs(x['time']  - x['time_x']), axis=1)
df.sort_values(by=['time', 'diff'], inplace=True)

df=df.groupby(['time']).first().reset_index()[['time', 'vel', 'yaw']]
Aritesh
  • 1,985
  • 1
  • 13
  • 17
  • Thanks for the solution, I run out of memory and kernel dies. Since i have almost df1-8000 rows values and df2-4000 values in real problem., Do you have any solution regrading memory issues @Aritesh – ankushbraj Apr 20 '18 at 12:43
0

It can be done with the help of iterrows() function.

Here is the code :

first table creation :

value=[(0,11,10,20),(1,22,11,21),(2,33,12,22),(3,44,13,23),(4,55,14,24), 
(5,66,15,25),(6,77,16,26),(7,88,17,27),(8,99,18,28)]
header=["index","time","vel","yaw"]
df1 = pd.DataFrame.from_records(value, columns=header)

df1

second table creation:

value=[(0,67,"nan","nan"),(1,75,"nan" ,"nan" ),(2,87,"nan" ,"nan" ) 
(3,99,"nan" ,"nan" )]
header=["index","time","vel","yaw"]
df2 = pd.DataFrame.from_records(value, columns=header)

df2

storing the result in df2 itself:

for index, row in df2.iterrows():
    min=10000000
    for indexer, rows in df1.iterrows():
        if abs(row['time']-rows['time'])<min:
            min = abs(row['time']-rows['time'])
            #storing the position 
            pos = indexer
    df2.loc[index,'vel'] = df1['vel'][pos]
    df2.loc[index,'yaw'] = df1['yaw'][pos]

result

nisahbhtrap
  • 92
  • 1
  • 1
  • 8