-2

I have 3 resampled pandas dataframes using the same data indexed by datetime.

Each dataframe is resampled using a different timeframe (e.g 30min / 60 min / 240 min).

2 of the dataframes have resampled correctly with the datetimes aligned because they have an equal number of rows (20) but the 3rd dataframe only has 12 rows because there isn't enough data to create 20 rows resampled to 240mins.

How can I adjust the 240min dataframe so the datetimes are aligned with the other 2 dataframes?

For example, every 2nd row in the 30min dataframe equals the corresponding row in the 60min dataframe and every 4 rows in the 60min dataframe should equal the corresponding row in 240min dataframe but this is not the case because the 240min dataframe has resampled the datetimes differently due there not being enough data to create 20 rows.

Georgy
  • 12,464
  • 7
  • 65
  • 73
esTERY
  • 31
  • 1
  • 9
  • 1
    I don't think someone can answer that. It depends on the noise you accept on your data, as there are many ways to fill some more data. You can for example just duplicate rows. But you can "create" data from scratch and hope it will be useful – Roim May 23 '20 at 07:22
  • Of course you can, you just have to control the start/end of indexing. I think this is a common issue and hopefully someone with strong resampling knowledge can answer. – esTERY May 23 '20 at 08:14
  • 5
    Then maybe I misunderstood you. please give more concrete example what you expected the dataframes to be like – Roim May 23 '20 at 09:00
  • 2
    Can you provide an example with code so that we can help you? – pakallis May 31 '20 at 15:33
  • 2
    We need help duplicating your input dataframes then, it will be easier to offer a solution. Can you create some mock code creating your three different dataframes? – Scott Boston Jun 01 '20 at 03:19
  • use pandas interpolation to reindex missing timestamps – Oli Jun 01 '20 at 17:14
  • 1
    I also don't understand the issue; you should expect different row lengths when resampling at different frequencies. If the same data are resampled, I don't see how 30 min and 60 min are resulting in equal output lengths (there should be about 2x as many entries for 30min) (correct me if I am wrong) – Tom Jun 02 '20 at 16:25
  • We'd love to help you solve the problem, but cannot do it without an example. – Itamar Mushkin Jun 03 '20 at 08:12
  • HI, thanks for the responses. Will provide an example so you understand it better shortly. – esTERY Jun 04 '20 at 09:05
  • Try stratified sampling the data, use the following [link](https://stackoverflow.com/questions/44114463/stratified-sampling-in-pandas) – Yash Gupta Jun 05 '20 at 08:23

1 Answers1

0

If you're just trying to align the different datasets to one index you can use pd.concat.

import pandas as pd

periods = 12.5 * 240
index = pd.date_range(start='1/1/2018', periods=periods, freq="min")

data = pd.DataFrame(list(range(int(periods))), index=index)

df1 = data.resample('30min').asfreq()
df2 = data.resample('60min').asfreq()
df3 = data.resample('240min').asfreq()

df4 = pd.concat([df1, df2, df3], axis=1)
print(df4)

Output:

2018-01-01 00:00:00     0     0.0     0.0
2018-01-01 00:30:00    30     NaN     NaN
2018-01-01 01:00:00    60    60.0     NaN
2018-01-01 01:30:00    90     NaN     NaN
2018-01-01 02:00:00   120   120.0     NaN
...                   ...     ...     ...
2018-01-02 23:30:00  2850     NaN     NaN
2018-01-03 00:00:00  2880  2880.0  2880.0
2018-01-03 00:30:00  2910     NaN     NaN
2018-01-03 01:00:00  2940  2940.0     NaN
2018-01-03 01:30:00  2970     NaN     NaN
Troy D
  • 2,093
  • 1
  • 14
  • 28