0

I am analysing meteorological data which is taken at non-standard intervals so the time/date entries are not all consecutive. The csv file has been read into a pandas dataframe using:

df4=pd.read_csv(datafilenew,parse_dates[1],infer_datetime_format=True,na_values=['M'])

I want to select chunks of data which are consecutive, for example every set of at least 5 rows which have consecutive time/date values. Here is a screenshot of a section of data. I would want to select all the entries from 2011-09-10 from this example and then continue to scan the rest of the data and select other consecutive sets of rows.

Is there a simple way to do this as I am completely at a loss. Thanks.

Bethany
  • 35
  • 3

2 Answers2

0

See this question for you answer

Otherwise you could probably use pandas diff() method (see here) And use pandas where() method (see here) to find the indices where the diff(timeseries) is the timedelta you are looking for

Josh Wilkins
  • 193
  • 1
  • 8
0

You could try this (assuming your dataframe is sorted):

m = df4.groupby([df4['TimeDate'].dt.date])['Direction'].transform('size') >= 5 # mask
df4 = df4.loc[m] # Apply mask

Full example:

import pandas as pd

data1 = '''\
TimeDate,Direction
2010-01-05 10:00,2
2010-01-05 11:00,3
2010-01-05 12:00,4
2010-01-05 13:00,5
2010-01-05 14:00,6
2010-01-06 13:00,7'''

df4 = pd.read_csv(pd.compat.StringIO(data1), sep=',', parse_dates=['TimeDate'])
df4[df4.groupby([df4['TimeDate'].dt.date])['Direction'].transform('size') >= 5]
print(df4)

or as suggested in comments, if you want to do something for each dataframe you could simply do this:

for ind, dfx in df4.groupby([df4['TimeDate'].dt.date]):
    if len(dfx) >= 5:
        # Apply your logic here for subdataframe with len >= 5
        print(dfx)
    else:
        # Apply logic for skipped subdataframes
        print('skip')
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • Thank you, I have managed to use your second suggestion to group the data. I was hoping to put this data into a new data frame, is there a way to do this? Sorry if this is a stupid question, I am very new to programming – Bethany Apr 03 '18 at 15:32
  • @Bethany no worries. Just try to show us or explain in detail how this new dataframe should look like. – Anton vBR Apr 03 '18 at 15:36
  • 1
    Apologies, your first suggestion does this perfectly I must have done something wrog the first time I tried it – Bethany Apr 03 '18 at 16:13