2

I have the following dataframe:

df=
    Record_ID       Time
        94704   2014-03-10 07:19:19.647342
        94705   2014-03-10 07:21:44.479363
        94706   2014-03-10 07:21:45.479581
        94707   2014-03-10 07:21:54.481588
        94708   2014-03-10 07:21:55.481804
        94709   2014-03-10 07:21:56.482029
        94710   2014-03-10 07:21:57.482254
        94711   2014-03-10 07:21:58.482473
        94712   2014-03-10 07:21:59.482706
        94713   2014-03-10 07:22:00.482917
        94714   2014-03-10 07:22:01.483279
        94715   2014-03-10 07:22:02.483545
        94716   2014-03-10 07:22:03.383563
        94717   2014-03-10 07:22:04.383786
        94718   2014-03-10 07:22:09.485624
        94719   2014-03-10 07:22:10.385118
        94720   2014-03-10 07:22:11.485454
        94721   2014-03-10 07:22:12.485592
        94722   2014-03-10 07:22:15.486335
        94723   2014-03-10 07:22:16.486475
        94724   2014-03-10 07:22:17.487023
        94725   2014-03-10 07:22:18.387020
        94726   2014-03-10 07:22:19.387120
        94727   2014-03-10 07:22:20.387379
        94728   2014-03-10 07:22:22.387786
        94729   2014-03-10 07:22:23.488032
        94730   2014-03-10 07:22:24.388232
        94731   2014-03-10 07:22:30.489594

I would like to know how to create a new dataframe that takes data every 60sec in order to reduce the size of the table.

emax
  • 6,965
  • 19
  • 74
  • 141

2 Answers2

3

You first need to set the index to your Time column in the DataFrame. You then resample as follows:

resampled = df.set_index('Time').resample('1min', how='first')
>>> resampled
                     Record_ID
Time                          
2014-03-10 07:19:00      94704
2014-03-10 07:20:00        NaN
2014-03-10 07:21:00      94705
2014-03-10 07:22:00      94713

Note that you get a NaN for 07:20 because there were no records during this interval. You can, of course, drop NaNs if desired.

>>> resampled.dropna()
                     Record_ID
Time                          
2014-03-10 07:19:00      94704
2014-03-10 07:21:00      94705
2014-03-10 07:22:00      94713
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Thank you, it works but at the end many points are missing. However this is a great solution – emax Sep 25 '15 at 21:38
  • It is difficult to give further advice without understanding your data. Perhaps you can use `mean` instead of `first`, or is no data available during the time period? – Alexander Sep 25 '15 at 21:41
  • Hello, this method is perfect for my purpose. Thank you again. – emax Sep 28 '15 at 15:26
0

I picked up a function called roundTime here: How to round the minute of a datetime object python

I put your sample data in a file called data.csv

import datetime


def roundTime(dt=None, roundTo=60):
   """Round a datetime object to any time laps in seconds
   dt : datetime.datetime object, default now.
   roundTo : Closest number of seconds to round to, default 1 minute.
   Author: Thierry Husson 2012 - Use it as you want but don't blame me.
   """
   if dt == None : dt = datetime.datetime.now()
   seconds = (dt - dt.min).seconds
   # // is a floor division, not a comment on following line:
   rounding = (seconds+roundTo/2) // roundTo * roundTo
   return dt + datetime.timedelta(0,rounding-seconds,-dt.microsecond)

df = pd.read_csv('data.csv')
df['Time'] = pd.to_datetime(df['Time'])
df['Time'] = df['Time'].map(lambda x : roundTime(x))

# now group by Time and select say the first record
print df.groupby('Time').min()

or here is an alternative if you dont want to do a group by

df['Time'] = pd.to_datetime(df['Time'])
df['Time'] = df['Time'].map(lambda x : roundTime(x))
slice_critera =  df['Time'].diff() !=0
print df[slice_critera]
Community
  • 1
  • 1
Dickster
  • 2,969
  • 3
  • 23
  • 29