0

I want to merge the dataframe df2 to the closest time of df1.df2 has less rows than df1.Those are time-series with different sampling times (in seconds):

df1
      A  B
1 s  1   2
2 s  2   4
3 s  3   5

df2
       C  D
0.2 s  1  2
1.7 s  2  4

df3 = pd.merge_asof(df1,df2,left_index=True, right_index=True)

df3
     A  B  C  D
1 s  1  2  1  2
2 s  2  4  2  4
3 s  3  5  2  4

Pandas is filling in the missing rows with a zero-hold interpolation:

but I would like to have NaN instead in the row that is missing like the following:

df3
    A  B  C    D
1 s 1  2  1    2
2 s 2  4  2    4
3 s 3  5  Nan  NaN

What's the most elegant way to achieve this?

Bruno
  • 632
  • 3
  • 10
  • 24
  • Use `df3 = pd.merge(df1,df2,left_index=True, right_index=True)` ? – jezrael May 03 '20 at 07:48
  • That woud merge only the index that are common. I think I did not explain well. I'll re-write the question :) – Bruno May 03 '20 at 07:53
  • I try your code and got `MergeError: left can only have one index`. for you working? – jezrael May 03 '20 at 08:03
  • Maybe because I added the 's' to show that those are seconds just in this post. Actually the code that works for me is the following `df1 = pd.DataFrame(data={'A':[1,2,3], 'B':[2,4,5]}, index = [0,1,2]) df2 = pd.DataFrame(data={'C':[1,2], 'D':[2,4]}, index = [0,1]) df3 = pd.merge_asof(df1,df2,left_index=True, right_index=True) ` – Bruno May 03 '20 at 08:15
  • At the end I did solved like this: `df3 = pd.merge_asof(df1,df2,left_index=True, right_index=True); for col in df3.columns.values: df3.loc[df3[col].duplicated(),col] = np.nan` – Bruno May 03 '20 at 08:16
  • So basically I go through the dataframe again and I substitute duplicates with NaN. I was expecting a more elegant solution but that's fine – Bruno May 03 '20 at 08:18
  • Is possible test `df2 = pd.DataFrame(data={'C':[1,1], 'D':[1,1]}, index = [0,1])` if your solution with duplicated is correct? – jezrael May 03 '20 at 08:25
  • oh you are right that would not work... – Bruno May 03 '20 at 14:44

0 Answers0