0

I have measurement data with 10 minutes interval. The thing is sometimes the time interval is 9min 59 seconds , or 10min 01 seconds and sometimes I have a missing value, so the time interval is 20 min.

What I want the code to do is the following: resample for 10min values (which I already implemented). The thing is, the measurements with interval other than 10:00 minutes (the 9min 59secs or 10min 01secs) are lost and I would like to keep this data.

Here is my testing code:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=('Datetime','V_L1','V_H3_L1','V_H3_L1_in_P'))

df['Datetime'] = ['01.01.2012 00:00:00', '01.01.2012 00:10:01', '01.01.2012 00:29:59','01.01.2012 00:50:00']
df['V_L1'] = [219,219.7,np.nan,220.3]
df['V_H3_L1'] = [3,1,2.5, np.nan]
df['Datetime'] = pd.to_datetime(df['Datetime'])
df.set_index('Datetime')
df = df.set_index('Datetime').resample('600S').asfreq()

Output:

                  V_L1  V_H3_L1  V_H3_L1_in_P
Datetime                                         
2012-01-01 00:00:00  219.0      3.0           NaN
2012-01-01 00:10:00    NaN      NaN           NaN
2012-01-01 00:20:00    NaN      NaN           NaN
2012-01-01 00:30:00    NaN      NaN           NaN
2012-01-01 00:40:00    NaN      NaN           NaN
2012-01-01 00:50:00  220.3      NaN           NaN

Wished output:

                  V_L1  V_H3_L1  V_H3_L1_in_P
Datetime                                         
2012-01-01 00:00:00  219.0      3.0           NaN
2012-01-01 00:10:00  219.7      1.0           NaN
2012-01-01 00:20:00    NaN      NaN           NaN
2012-01-01 00:30:00    NaN      2.5           NaN
2012-01-01 00:40:00    NaN      NaN           NaN
2012-01-01 00:50:00  220.3      NaN           NaN

So I want to keep the data like accepting if the delta from the frequency set (10min, 600s) is smaller than some seconds + or - 5 seconds.

T. Novais
  • 79
  • 8
  • `df = df.set_index('Datetime').resample('600S’).first()`? Can also use `.mean()`, `.last()`, etc. – BallpointBen May 09 '19 at 20:49
  • 1
    Since it seems you want the `10:01` to go with :10, and the `29:59` to go with :20, seems you may want to do `df['Datetime'] = df.Datetime.dt.floor('10min')` before the resample? Otherwise, there's also `.ceil()` or `.round()` – ALollz May 09 '19 at 20:50
  • @ALollz, actually the wished output had a wrong data, i want the 29:59 to round to 30! Sorry for that, I corrected the wished output data right now. – T. Novais May 09 '19 at 21:05

2 Answers2

1
df['Datetime'] = df['Datetime'].dt.round('min')
df = df.set_index('Datetime').resample('600S').asfreq()

Round the datetime to the nearest minute, then you can set_index and resample.

Ben Pap
  • 2,549
  • 1
  • 8
  • 17
  • your solution works very well when considering the rounding within a minute delta! I think it is the cleanest solution as per the problem description, I have though added another solution (less pretty) for a more generic rounding to the 10min values I want. – T. Novais May 09 '19 at 22:18
1

well, I wrote a function which is not very beautiful (I must assume) but it does what I want it to do. As I'm dealing with a lot of data, I think this might be a safe approach. Basically with the if, elif structure the function checks the minute part of the Timestamp and according to its value, decides the rounding... (up or down), I'm quite sure there are better ways to solve, please share if you have one.

  • if >=55, round to next full hour, elif >=45 to 50, elif >=35 to 40 and so on..

So, the code is:

import datetime

def round_time(time):
    if time.minute>=55:
        if time.hour==23:
            rounded = time-datetime.timedelta(hours=time.hour,minutes=time.minute,seconds=time.second)+datetime.timedelta(hours=time.hour+1,minutes=0,seconds=0)
        else:
            rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(hours=time.hour+1, minutes=0, seconds=0)
elif time.minute >=45:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=50)
    elif time.minute >=35:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=40)
    elif time.minute >=25:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=30)
    elif time.minute >=15:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=20)
    elif time.minute >=5:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=10)
    elif time.minute >=0:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=0)
    return rounded

df['Datetime'] = df['Datetime'].apply(lambda x: round_time(x))
df = df.set_index('Datetime').resample('600S').asfreq()

got the idea from How do I round datetime column to nearest quarter hour

although the solution on the above mentioned thread didn't solve for the 10 min values, was a good reference! (29min still being rounded to 20, instead of the value that I wish, 30)

T. Novais
  • 79
  • 8