3

i have dataframe that is cross the day(12-02~1203) belows.. and i wanna move yesterday data(12-02 22:00~00:00) to today data(12-03) everyday. date/time is mutli-indexed. this is needed when i analyze data, it's more convenient day by day. but now i need to analyze data including yesterday last 2hours... so i need this dataframe manipulation.

..
 date         time       a     b 
2015-12-02  21:00:00    23.97   0
2015-12-02  21:15:00    24.06   0
2015-12-02  21:30:00    24.03   0
2015-12-02  21:45:00    23.99   0
2015-12-02  22:00:00    24.03   0
2015-12-02  22:15:00    23.89   0
2015-12-02  22:30:00    23.71   0
2015-12-02  22:45:00    23.64   0
2015-12-02  23:00:00    23.29   0
2015-12-02  23:15:00    23.8    0
2015-12-02  23:30:00    23.82   0
2015-12-02  23:45:00    23.86   0
2015-12-03  0:00:00 23.66   0
2015-12-03  0:15:00 23.64   0
2015-12-03  0:30:00 23.7    0
2015-12-03  0:45:00 23.69   0
2015-12-03  1:00:00 23.65   0
2015-12-03  1:15:00 23.48   0
2015-12-03  1:30:00 23.45   0
..

results should be like below(12-02 22:00~23:45 data is moved to 12-03 how can i do it ?

..
2015-12-02  21:00:00    23.97   0
2015-12-02  21:15:00    24.06   0
2015-12-02  21:30:00    24.03   0
2015-12-02  21:45:00    23.99   0
2015-12-03  22:00:00    24.03   0
2015-12-03  22:15:00    23.89   0
2015-12-03  22:30:00    23.71   0
2015-12-03  22:45:00    23.64   0
2015-12-03  23:00:00    23.29   0
2015-12-03  23:15:00    23.8    0
2015-12-03  23:30:00    23.82   0
2015-12-03  23:45:00    23.86   0
2015-12-03  0:00:00 23.66   0
2015-12-03  0:15:00 23.64   0
2015-12-03  0:30:00 23.7    0
2015-12-03  0:45:00 23.69   0
2015-12-03  1:00:00 23.65   0
2015-12-03  1:15:00 23.48   0
2015-12-03  1:30:00 23.45   0
..
jpp
  • 159,742
  • 34
  • 281
  • 339
jerry han
  • 425
  • 4
  • 15

3 Answers3

2

I think you need:

from datetime import date, datetime, time, timedelta

m = df.index.get_level_values(1) < time(22,0,0)
idx1 = df.index.get_level_values(0)
idx2 = df.index.get_level_values(1)
df.index = [idx1.where(m, idx1 +  timedelta(days=1)), idx2]

print (df)
                         a  b
date       time              
2015-12-02 21:00:00  23.97  0
           21:15:00  24.06  0
           21:30:00  24.03  0
           21:45:00  23.99  0
2015-12-03 22:00:00  24.03  0
           22:15:00  23.89  0
           22:30:00  23.71  0
           22:45:00  23.64  0
           23:00:00  23.29  0
           23:15:00  23.80  0
           23:30:00  23.82  0
           23:45:00  23.86  0
           00:00:00  23.66  0
           00:15:00  23.64  0
           00:30:00  23.70  0
           00:45:00  23.69  0
           01:00:00  23.65  0
           01:15:00  23.48  0
           01:30:00  23.45  0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

This way should be efficient. First extract the hour of each time, then increment by one day for hours >= 22.

import pandas as pd
from datetime import timedelta

df['hour'] = pd.to_datetime(df['time'], format='%H:%M:%S').dt.hour
df.loc[df['hour'] >=22, 'date'] = df['date'] +  timedelta(days=1)
jpp
  • 159,742
  • 34
  • 281
  • 339
  • when executing "df_per_wday.hour = pd.to_datetime(df_per_wday.time, format='%H:%M:%S').dt.hour" got error 'DataFrame' object has no attribute 'time' – jerry han Feb 05 '18 at 12:19
  • in my dataframe data,time is index is it effect error ? – jerry han Feb 05 '18 at 12:20
  • @jerryhan, try the code now. i've explicitly identified column using `df[col]` notation. – jpp Feb 05 '18 at 12:21
  • also try promoting index to columns: `df = df.reset_index()` – jpp Feb 05 '18 at 12:22
  • this also got errors... pd.to_datetime(df['time'], format='%H:%M:%S').dt.hour does not execute ... date time are multi index ... – jerry han Feb 05 '18 at 12:35
  • looks like you should remove the multiindex first.. see [here](https://stackoverflow.com/questions/18624039/pandas-reset-index-on-series-to-remove-multiindex). essentially what i said earlier, use `df = df.reset_index()` first. – jpp Feb 05 '18 at 12:36
  • i tried "df['hour'] =pd.to_datetime(df.index.get_level_values(1), format='%H:%M:%S').dt.hour" and got error 'DatetimeIndex' object has no attribute 'dt'.. i need keep multi-index... so i tried this – jerry han Feb 05 '18 at 12:39
0

I'm not sure if this is the fastest way but you can consider to use np.where

import numpy as np
import pandas as pd

df["date"] = pd.to_datetime(df["date"])
offset = pd.DateOffset(days=1)
df["date"] = np.where((df["time"]>="22:00") & (df["time"]<="23:45" ),
                      df["date"] + offset,
                      df["date"])
rpanai
  • 12,515
  • 2
  • 42
  • 64