2

I have DataFrames that represent data from two different sensors:

In[0]: df0
Out[0]:
   time  foo  
0   0.1  123  
1   1.0  234  
2   2.1  345
3   3.1  456  
4   3.9  567  
5   5.1  678  

In[0]: df1
Out[0]:
   time  bar  
0  -0.9  876  
1  -0.1  765  
2   0.7  654  
3   2.1  543  
4   3.0  432  

The sensors provide a measure (foo or bar) and a timestamp (time) for each events that they are monitoring. A couple things to note:

  1. the timestamps are close, but not identical
  2. the range over which data was collected is different across sensors (i.e. they were turned on and turned off independently)

I'm trying to align df0 and df1 to get the following:

In[3]: df3
Out[3]:
   time_df0  foo  time_df1  bar  
0       nan  nan       -0.9  876
1       0.1  123       -0.1  765
2       1.0  234        0.7  654
3       2.1  345        2.1  543
4       3.1  456        3.0  432
5       3.9  567        nan  nan
6       5.1  678        nan  nan
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
ajwood
  • 18,227
  • 15
  • 61
  • 104
  • 2
    To get you started: http://stackoverflow.com/questions/21201618/pandas-merge-match-the-nearest-time-stamp-the-series-of-timestamps and http://stackoverflow.com/questions/24614474/pandas-merge-on-name-and-closest-date Also http://stackoverflow.com/questions/33491840/merging-two-pandas-dataframes-on-nearest-time-stamp – Kartik Sep 13 '16 at 17:50
  • @ajwood, is your `time` column of `datetime` dtype? – MaxU - stand with Ukraine Sep 13 '16 at 18:18
  • @MaxU no, it's a float offset (in seconds) from when the sensor was turned on; the only reason the times are close between sensors is because I've done other processing for to determine find that offset. Now I just need to to create a common index – ajwood Sep 13 '16 at 18:26
  • @ajwood, your sample data sets are absolutely fine. I thought that your `time` column is of `datetime` dtype - that's why i asked you to change your sample data sets (just ignore my first comment)... – MaxU - stand with Ukraine Sep 13 '16 at 18:30

1 Answers1

1

@Kartik posted a perfect links to start with...

Here is a starting point:

df0.set_index('time', inplace=True)
df1.set_index('time', inplace=True)

In [36]: df1.reindex(df0.index, method='nearest').join(df0)
Out[36]:
      bar  foo
time
0.1   765  123
1.0   654  234
2.1   543  345
3.1   432  456
3.9   432  567
5.1   432  678
Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Is there any way to make the leading and trailing rows "follow" the new index (i.e. the `nan` values in my example)? – ajwood Sep 13 '16 at 18:47
  • A brute-force solution could be trying several integer offsets on the original index, minimize the difference between timestamps, then merging of the optimal index offset. – ajwood Sep 13 '16 at 18:53
  • @ajwood, sorry, currently i don't see any elegant solution for not-matching rows... – MaxU - stand with Ukraine Sep 13 '16 at 19:36
  • no problem - these datasets will be small, so the brute force solution should work for me. I'll add an answer once I have something concise to report – ajwood Sep 13 '16 at 19:48