1

I need to combine two datasets, into a single dataset.

I have two generated data frames - pandas, one with data samples every ~second, another with data sampled every ~120 seconds.

How can i merge these two, governed by the interval of the app. 120 sec generating task.

I have currently pulled every 120'th sampleset out of the fast generated 1 sec. data set. These are not accurate, 1 second and contain a bit of jitter.

                     Time  Torque [Nm]  Speed [1/s]
54240 2017-04-05 21:21:21       938.00       3000.0
54252 2017-04-05 21:23:23       936.25       3000.0
54264 2017-04-05 21:25:24       948.50       3000.0
54276 2017-04-05 21:27:26       948.50       3000.0
54288 2017-04-05 21:29:28       936.25       3000.0
54300 2017-04-05 21:31:29       952.00       3000.0
54312 2017-04-05 21:33:31       945.00       3000.0
54324 2017-04-05 21:35:33       927.50       3000.0

And likewise i have the data set with 120 sec intevals

                   Time   FFT ISO   FFTe: FO
0   2017-04-05 21:26:08   20.5754  16.377570
1   2017-04-05 21:28:08  106.1549  32.836566
2   2017-04-05 21:30:07   16.2735  19.308864
3   2017-04-05 21:32:08   24.2232  42.766070
4   2017-04-05 21:34:08   35.5723  64.152879
5   2017-04-05 21:36:08    3.7364  29.323316
6   2017-04-05 21:38:08   21.8207  17.796711
7   2017-04-05 21:40:08    9.9334  49.642802

The timestamps are not identical, and may contain a bit of jitter.

I would like to combine the data columns, so the data (Torque [Nm], Speed [1/s], FFT ISO, FFTe: FO) occurring within the same 120 second interval are combined.

Perhaps i should define a 120 second "reference interval", and fit the data into these equally sized slots.

A assume it can be done using pd.concat or pd.append but i have not quite figured how

Any help is appreciated

opprud
  • 169
  • 1
  • 1
  • 9
  • 1
    Have you looked at any questions like [this one](https://stackoverflow.com/questions/34880539/pandas-merging-based-on-a-timestamp-which-do-not-match-exactly)? – cmaher Jul 27 '17 at 12:06

1 Answers1

4

Use the resample/mean method to normalize the indexes to both have frequency 120S by taking the mean of all values in each 120 second period.

resampled1 = df1.resample('120S').mean()
resampled2 = df2.resample('120S').mean()
result = resampled1.join(resampled2)

For example,

import numpy as np
import pandas as pd
np.random.seed(2017)

def make_index(N, freq):
    index = pd.date_range('2000-1-1', periods=N, freq=freq).view('i8')
    index = (np.sort(index + np.random.uniform(0, np.diff(index).mean(), size=N).astype(int))
             .view('datetime64[ns]'))
    return index

N = 100
sec_index = make_index(120*N, 'S')
sec120_index = make_index(N, '120S')

df1 = pd.DataFrame({'Torque': np.random.random(120*N),
                    'Speed': np.random.random(120*N),
                    'Time': sec_index})

df2 = pd.DataFrame({'FFT ISO': np.random.random(N),
                    'FFTe: FO': np.random.random(N),
                    'Time': sec120_index})

df1 = df1.set_index('Time')
df2 = df2.set_index('Time')

resampled1 = df1.resample('120S').mean()
resampled2 = df2.resample('120S').mean()
result = resampled1.join(resampled2)
print(result.head())

yields

                        Speed    Torque   FFT ISO  FFTe: FO
Time                                                       
2000-01-01 00:00:00  0.482262  0.470523  0.435150  0.289036
2000-01-01 00:02:00  0.501221  0.476776  0.005576  0.284386
2000-01-01 00:04:00  0.491305  0.459710  0.249217  0.253787
2000-01-01 00:06:00  0.486900  0.498921  0.391429  0.854698
2000-01-01 00:08:00  0.485611  0.517818  0.071058  0.552727
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677