0

I have a dataframe which contains a Time Stamp column, and two data columns (data1 and data2).

The data1 column spans the entire Time Stamp, while the data2 column stops about halfway. When I was collecting my data, both data1 and data2 collected data for the same time, except at different frequencies.

I would like the data2 column to I understand that I should be leaning towards the resample or reindex functions, but I am unsure how to do this. My Time Stamp column is an object, while my two data columns are float64 types.

What is the easiest way for me to accomplish this goal?

I have tried to refer to the following question, but I was having trouble implementing it: PANDAS - Loop over two datetime indexes with different sizes to compare days and values

enter image description here

Gary
  • 2,137
  • 3
  • 23
  • 41
  • 1
    Please post a sample of your dataframe and your desired output for potential answerers to visualize your problem. – ysearka Jul 20 '17 at 13:10
  • So the times correspond to the high frequency data? And your low frequency data is currently not associated with a correct timestamp? (except at 0) – GeoMatt22 Jul 20 '17 at 13:18
  • @GeoMatt22 Correct - my `data1` corresponds to the TimeStamp in my `.csv`, while `data2` does not. I would like `data2` to span the length of `data1`. – Gary Jul 20 '17 at 13:20
  • @GeoMatt22 `data2` is sampled at 10Hz, while `data1` is sampled at 500Hz. – Gary Jul 20 '17 at 13:26
  • OK. Before any resampling you would first need to register the 10Hz data (i.e. locate the current data in time). So you need to split off `data2` into a separate dataframe and then assign the appropriate index (note: this is [not a `pd.DataFrame.reindex`](https://stackoverflow.com/questions/10943478/pandas-reindex-dataframe-with-datetime-objects)). – GeoMatt22 Jul 20 '17 at 13:32
  • Is there a way for me to fill extend `data2` all the way down the entire index of `data1`, and just fill in everything in between with `NaN`? – Gary Jul 20 '17 at 13:35
  • Can you do `df['data2_new'] = np.NaN` then `df.loc[::50, 'data2_new'] = df[:len(df)//50, 'data2']` (where 50 = 500 Hz/10 Hz)? – GeoMatt22 Jul 20 '17 at 13:45

1 Answers1

1

Here's what I think you're trying to do. My assumptions is that your timestamps are aligned by some multiplier. I've used every 2 minutes in my example, since that's what your example appears to be. Here's my sample dataframe:

df
                        a     b
DATE
2017-05-29 06:30:00   0.0   0.0
2017-05-29 06:31:00   9.0  24.0
2017-05-29 06:32:00  10.0   1.0
2017-05-29 06:33:00  10.0   1.0
2017-05-29 06:34:00   0.0   7.0
2017-05-29 06:35:00   3.0   3.0
2017-05-29 06:36:00   0.0   4.0
2017-05-29 06:37:00   0.0   1.0
2017-05-29 06:38:00   0.0   0.0
2017-05-29 06:39:00   0.0   2.0
2017-05-29 06:40:00   0.0   NaN
2017-05-29 06:41:00   0.0   NaN
2017-05-29 06:42:00   0.0   NaN
2017-05-29 06:43:00   0.0   NaN
2017-05-29 06:44:00   0.0   NaN
2017-05-29 06:45:00   2.0   NaN
2017-05-29 06:46:00   4.0   NaN
2017-05-29 06:47:00   0.0   NaN
2017-05-29 06:48:00   4.0   NaN
2017-05-29 06:49:00   8.0   NaN

Extract the misaligned column to it's own dataframe and add a counter column, then add the timedelta to the index, replace the old index, and concatenate the data columns.

b = df['b'][:10].to_frame()
b.insert(0, 'counter', range(len(b)))
b.index = b.index.to_series().apply(lambda x: x + pd.Timedelta(minutes=b.loc[x].counter))
pd.concat([df['a'], b['b']], axis=1)
                        a     b
DATE
2017-05-29 06:30:00   0.0   0.0
2017-05-29 06:31:00   9.0   NaN
2017-05-29 06:32:00  10.0  24.0
2017-05-29 06:33:00  10.0   NaN
2017-05-29 06:34:00   0.0   1.0
2017-05-29 06:35:00   3.0   NaN
2017-05-29 06:36:00   0.0   1.0
2017-05-29 06:37:00   0.0   NaN
2017-05-29 06:38:00   0.0   7.0
2017-05-29 06:39:00   0.0   NaN
2017-05-29 06:40:00   0.0   3.0
2017-05-29 06:41:00   0.0   NaN
2017-05-29 06:42:00   0.0   4.0
2017-05-29 06:43:00   0.0   NaN
2017-05-29 06:44:00   0.0   1.0
2017-05-29 06:45:00   2.0   NaN
2017-05-29 06:46:00   4.0   0.0
2017-05-29 06:47:00   0.0   NaN
2017-05-29 06:48:00   4.0   2.0
2017-05-29 06:49:00   8.0   NaN

It probably goes without saying, but it would be much better to apply correct timestamps to each of the columns when you ingest them.

jxstanford
  • 3,339
  • 3
  • 27
  • 39