Select consecutive rows using Timedate function in Pandas

Question

I am analysing meteorological data which is taken at non-standard intervals so the time/date entries are not all consecutive. The csv file has been read into a pandas dataframe using:

df4=pd.read_csv(datafilenew,parse_dates[1],infer_datetime_format=True,na_values=['M'])

I want to select chunks of data which are consecutive, for example every set of at least 5 rows which have consecutive time/date values. Here is a screenshot of a section of data. I would want to select all the entries from 2011-09-10 from this example and then continue to scan the rest of the data and select other consecutive sets of rows.

Is there a simple way to do this as I am completely at a loss. Thanks.

Would `groupby` be what you're looking for? After grouping rows, you could drop those groups that do not contain at least 5 entries. — rahlf23, Apr 03 '18 at 14:25
Please add some example of input and output format of dataframe. — Rao Sahab, Apr 03 '18 at 14:27
I've added an image of the dataframe output as I cannot think of a way to make a simplified example. — Bethany, Apr 03 '18 at 14:37

Josh Wilkins · Answer 1 · 2018-04-03T14:47:24.933

0

See this question for you answer

Otherwise you could probably use pandas diff() method (see here) And use pandas where() method (see here) to find the indices where the diff(timeseries) is the timedelta you are looking for

edited Apr 03 '18 at 14:47

answered Apr 03 '18 at 14:30

Josh Wilkins

193
1
8

Anton vBR · Accepted Answer · 2018-04-03T15:02:47.237

0

You could try this (assuming your dataframe is sorted):

m = df4.groupby([df4['TimeDate'].dt.date])['Direction'].transform('size') >= 5 # mask
df4 = df4.loc[m] # Apply mask

Full example:

import pandas as pd

data1 = '''\
TimeDate,Direction
2010-01-05 10:00,2
2010-01-05 11:00,3
2010-01-05 12:00,4
2010-01-05 13:00,5
2010-01-05 14:00,6
2010-01-06 13:00,7'''

df4 = pd.read_csv(pd.compat.StringIO(data1), sep=',', parse_dates=['TimeDate'])
df4[df4.groupby([df4['TimeDate'].dt.date])['Direction'].transform('size') >= 5]
print(df4)

or as suggested in comments, if you want to do something for each dataframe you could simply do this:

for ind, dfx in df4.groupby([df4['TimeDate'].dt.date]):
    if len(dfx) >= 5:
        # Apply your logic here for subdataframe with len >= 5
        print(dfx)
    else:
        # Apply logic for skipped subdataframes
        print('skip')

edited Apr 03 '18 at 15:02

answered Apr 03 '18 at 14:54

Anton vBR

18,287
5
40
46

Thank you, I have managed to use your second suggestion to group the data. I was hoping to put this data into a new data frame, is there a way to do this? Sorry if this is a stupid question, I am very new to programming – Bethany Apr 03 '18 at 15:32
@Bethany no worries. Just try to show us or explain in detail how this new dataframe should look like. – Anton vBR Apr 03 '18 at 15:36
1

Apologies, your first suggestion does this perfectly I must have done something wrog the first time I tried it – Bethany Apr 03 '18 at 16:13

Select consecutive rows using Timedate function in Pandas

2 Answers2