2

I have a data like this,

         ID     datetime           
0         2  2015-01-09 19:05:39   
1         1  2015-01-10 20:33:38   
2         1  2015-01-10 21:10:00 

I've converted this datetime into unix time stamp

         ID   timestamp           
0         2  1420830339
1         1  1420922018   
2         1  1420924200 

I want to first convert unix time stamp to EST standard and then bin each row into a 10 minutes interval. I need a column to indicate which bin this row belongs to.

My min datetime is 2015-01-01 00:00:00 and I only have data for jan 2015 from 1 to 31.

How can I achieve this using python or pandas.

user_12
  • 1,778
  • 7
  • 31
  • 72

1 Answers1

4

Use date_range with cut for binning by 10 minutes:

df['datetime'] = pd.to_datetime(df['datetime'])

bins = pd.date_range('2015-01-01', '2015-02-01', freq='10T')
df['bins'] = pd.cut(df['datetime'], bins)
df['lab'] = pd.cut(df['datetime'], bins, labels=False)

df['bins_left'] = pd.IntervalIndex(pd.cut(df['datetime'], bins)).left
df['bins_right'] = pd.IntervalIndex(pd.cut(df['datetime'], bins)).right

df['bins_left_unix'] =  df['bins_left'].to_numpy().astype(np.int64) // 10**9
df['bins_right_unix'] = df['bins_right'].to_numpy().astype(np.int64) // 10**9

print (df)
   ID            datetime                                        bins   lab  \
0   2 2015-01-09 19:05:39  (2015-01-09 19:00:00, 2015-01-09 19:10:00]  1266   
1   1 2015-01-10 20:33:38  (2015-01-10 20:30:00, 2015-01-10 20:40:00]  1419   
2   1 2015-01-10 21:10:00  (2015-01-10 21:00:00, 2015-01-10 21:10:00]  1422   

            bins_left          bins_right  bins_left_unix  bins_right_unix  
0 2015-01-09 19:00:00 2015-01-09 19:10:00      1420830000       1420830600  
1 2015-01-10 20:30:00 2015-01-10 20:40:00      1420921800       1420922400  
2 2015-01-10 21:00:00 2015-01-10 21:10:00      1420923600       1420924200  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I forgot to mention that I need a column to indicate which ten-bin this row belongs to as well. Updated the question. – user_12 Aug 06 '19 at 06:34
  • @user_12 - Can you add expected output? – jezrael Aug 06 '19 at 06:35
  • Let me explain. If I have a datetime 2015-01-01 00:00:00 to datetime 2015-01-01 00:10:00 these rows with time in between them should belong to first bin and then 2015-01-01 00:10:00 to 2015-01-01 00:20:00 belong to another bin. I need [0, 1...] indicating the bin, – user_12 Aug 06 '19 at 06:38
  • Thank you. Let me check it with my original data. – user_12 Aug 06 '19 at 06:42
  • Also can you tell me the reason why you have divided it by 10**9? Maybe if possible add a comment there. – user_12 Aug 06 '19 at 06:58
  • @user_12 - Because it is unix time in seconds - [check](https://stackoverflow.com/questions/15203623/convert-pandas-datetimeindex-to-unix-time) – jezrael Aug 06 '19 at 07:14
  • I know unix time means it returns second past jan 1 1970 (UTC) right. So why are dividing it by 10**9? **I didn't get that can you please clarify this one. Does it convert it to minutes?** That link was not helpful. – user_12 Aug 06 '19 at 08:32
  • @user_12 - I think not, it was converted to unix in seconds. If need minutes try dived by `60`. – jezrael Aug 06 '19 at 09:00
  • `[1420830339, 1420922018, 1420924200]` this is already in seconds right jezrael. Why do we have to again convert it to seconds. Is there any logic behind it? – user_12 Aug 06 '19 at 09:04
  • 1
    @user_12 - `why are dividing it by 109? *` - because default values are `ns` - nanoseconds – jezrael Aug 06 '19 at 09:06
  • @user_12 - I think you can do it, but not sure if difference in output `.dt.tz_localize('UTC').dt.tz_convert('US/Eastern')` – jezrael Aug 06 '19 at 09:11
  • Assume this is my original data `2015-01-09 19:05:39(EST) ` now I converted it into unix time with `time.mktime()` and it returned `1420830339 (GMT)`. how can I convert it to EST standard before binning. – user_12 Aug 06 '19 at 09:11