0

I have a problem with my data preparation, I have two time series dataframes which I want to merge into a 30-minute interval. The first dataset is in a 10minute interval while the others in a 15 minutes interval, ideally it should be possible to join this to form a 30-minute interval DF

I tried the guide here, but I can't seem to get it, and I think it only allows the frequence - 'H' and this SO question.

DF_A

    TIME    LEVELS_A
0   0   0
1   900 0
2   1800    0
3   2700    0
4   3600    0
5   4500    0


DF_B

    TIME    LEVELS_B
0   0   2.16
1   600 2.16
2   1200    2.12
3   1800    1.989382667
4   2400    1.989382667
5   3000    1.989382667

Expected results are:

DF_MERGED

    TIME    LEVELS_A             LEVELS_B
0   0   
1   1800    2.16, 2.16, 2.16       0,0        
2   3600    2.16, 2.16, 2.16       0,1
3   5400    2.16, 2.16, 2.16       1,0
4   7200    2.16, 2.16, 2.16       1,0
5   9000    2.16, 2.16, 2.16       0,0

Everything is already imputed so it's unlike to have any 'NaN's. also, for every three LEVELS_A there are two LEVELS_B. How should this be merged with pd.Datframe?

or perhaps, I just want to get the max of each entry so it would be ...

DF_MERGED_V2

    TIME    LEVELS_A             LEVELS_B
0   0   
1   1800    2.16                   0       
2   3600    2.16                   1
3   5400    2.16                   1
4   7200    2.16                   1
5   9000    2.16                   0

I want to programatically do this with pandas

deku
  • 131
  • 10
  • I dont understand how you get to that expected result numbers from the data you provided. Can you please verify that the numbers are correct? – ecortazar Mar 29 '19 at 10:31
  • @ecortazar those are just sample values sir, sorry for the confusion. I just want to show a sample dataset to merge – deku Mar 29 '19 at 11:03

1 Answers1

1

In order to avoid any issues that might no unnoticed during the aggregation, I'd recommend translating the time column into actual datetimes first. Then it's a simple group by operation that you are looking for.

Here is my proposal:

Load Data:

a = '''TIME    LEVELS_A
0   0   0
1   900 0
2   1800    0
3   2700    0
4   3600    0
5   4500    0
'''
b = '''TIME    LEVELS_B
0   0   2.16
1   600 2.16
2   1200    2.12
3   1800    1.989382667
4   2400    1.989382667
5   3000    1.989382667
'''

df_a = pd.DataFrame.from_csv(io.StringIO(a), sep='\s+')
df_b = pd.DataFrame.from_csv(io.StringIO(b), sep='\s+')

The Solution

import datetime as dt
import pandas as pd

reference_date = dt.datetime(2019,1,1) # Arbitrary date used for reference
df_a.index = reference_date + df_a['TIME'].astype('timedelta64[s]')
df_b.index = reference_date + df_b['TIME'].astype('timedelta64[s]')

new_a = df_a['LEVELS_A'].groupby(pd.TimeGrouper(freq='30T')).apply(lambda x: x.tolist())
new_b = df_b['LEVELS_B'].groupby(pd.TimeGrouper(freq='30T')).apply(lambda x: x.tolist())

merged_df = pd.concat({'LEVELS_A': new_a, 'LEVELS_B': new_b}, axis = 1, sort=True)

merged_df.index = (merged_df.index - reference_date).seconds # Return to original Time format

The Output:

       LEVELS_A     LEVELS_B
0       [0, 0]     [2.16, 2.16, 2.12]
1800    [0, 0]     [1.989, 1.989, 1.989]
3600    [0, 0]     NaN

Sidenote:

If all you want is the maximum element in each list, add the following.

merged_df.applymap(lambda x: max(x) if isinstance(x, list) else np.nan)

Output:

    LEVELS_A    LEVELS_B    
0       0       2.160000
1800    0       1.989383
3600    0       NaN
ecortazar
  • 1,382
  • 1
  • 6
  • 12
  • Awesome sir, this indeed works in the concatenation part http://prntscr.com/n4lfpn, but the values are all over the place, I just needed three values for LEVELS_A and two for LEVELS_B and or perhaps just get the heighst value of the three and the two then merge it. – deku Mar 29 '19 at 11:20
  • I've expanded my solution to show the calculation I'm doing and output i'm getting. I'm not sure what the difference in the data it that is creating an issue. But this returns only 2 values in LEVEL A and 3 values in the LEVEL B – ecortazar Mar 29 '19 at 11:31