1

I have a column of time in my pandas DataFrame containing more than 800,000 rows. The time format is something like this:

08:28:31
08:28:35
08:28:44
08:28:44

I want to convert this format into hourly, which means if the first time comes 08:28:31 then the second-time time should come in hour by hour 09:28:31 etc. How do we achieve this in python using the DateTime library

output data:

08:28:31
09:28:31
...
23:28:31

08:28:35
09:28:35
...
23:28:35

08:28:44
...
08:28:44
...
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Hamza
  • 530
  • 5
  • 27

1 Answers1

1

Use:

#convert values to datetimes
df['date'] = pd.to_datetime(df['date'])

#count number of repeated values
df = df.loc[df.index.repeat(24 - df['date'].dt.hour)]
#generate hour timedeltas
hours = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='H')

#add to dates and generate times with convert index to default values
s = df['date'].add(hours).dt.time.reset_index(drop=True)
print (s)
0     08:28:31
1     09:28:31
2     10:28:31
3     11:28:31
4     12:28:31
  
59    19:28:44
60    20:28:44
61    21:28:44
62    22:28:44
63    23:28:44
Length: 64, dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • What if we want to take the time after every 15 minutes . Is it possible ? – Hamza Mar 08 '21 at 11:10
  • @OfficialDeveloper - Can you try change `hours = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='H')` to `hours = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='Min') * 15` ? – jezrael Mar 08 '21 at 11:12
  • As you told me that Use df.groupby('Time', as_index=False)['Price'].mean() , if duplicates occurs , I have to take the mean of only duplicated rows not entire column – Hamza Mar 08 '21 at 12:30
  • @OfficialDeveloper - yop, but if use `mean` of not duplicates get same values, so solution working well. – jezrael Mar 08 '21 at 12:31