5

A column in my pandas data frame represents a time delta that I calculated with datetime then exported into a csv and read back into a pandas data frame. Now the column's dtype is object whereas I want it to be a timedelta so I can perform a groupby function on the dataframe. Below is what the strings look like. Thanks!

  0 days 00:00:57.416000
  0 days 00:00:12.036000
  0 days 16:46:23.127000  
 49 days 00:09:30.813000  
 50 days 00:39:31.306000  
 55 days 12:39:32.269000
-1 days +22:03:05.256000

Update, my best attempt at writing a for-loop to iterate over a specific column in my pandas dataframe:

def delta(i):
    days, timestamp = i.split(" days ")
    timestamp = timestamp[:len(timestamp)-7]
    t = datetime.datetime.strptime(timestamp,"%H:%M:%S") + 
    datetime.timedelta(days=int(days))
    delta = datetime.timedelta(days=t.day, hours=t.hour, 
    minutes=t.minute, seconds=t.second)
    delta.total_seconds()

data['diff'].map(delta)
Graham Streich
  • 874
  • 3
  • 15
  • 31

3 Answers3

5

Use pd.to_timedelta

pd.to_timedelta(df.iloc[:, 0])

0     0 days 00:00:57.416000
1     0 days 00:00:12.036000
2     0 days 16:46:23.127000
3    49 days 00:09:30.813000
4    50 days 00:39:31.306000
5    55 days 12:39:32.269000
6   -1 days +22:03:05.256000
Name: 0, dtype: timedelta64[ns]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 1
    This answer is definitely the better one. I didn't know pandas had `to_timedelta` function, goode one PiRSquared – user1767754 Jun 18 '17 at 11:19
2
import datetime

#Parse your string
days, timestamp = "55 days 12:39:32.269000".split(" days ")
timestamp = timestamp[:len(timestamp)-7]

#Generate datetime object
t = datetime.datetime.strptime(timestamp,"%H:%M:%S") + datetime.timedelta(days=int(days))

#Generate a timedelta
delta = datetime.timedelta(days=t.day, hours=t.hour, minutes=t.minute, seconds=t.second)

#Represent in Seconds
delta.total_seconds()
user1767754
  • 23,311
  • 18
  • 141
  • 164
  • how do I iterate this over a specific column in a pandas dataframe? thanks -- I posted my best attempt as an update to my question! it could also be made into a new column.... – Graham Streich Jun 18 '17 at 04:34
  • 1
    @GrahamStreich you should use `map`, see this https://stackoverflow.com/a/19798528/5496463 – danche Jun 18 '17 at 04:42
  • thank you both. I made a function and applied it with the map method, however I ran across another problem. Some of the hours are prefaced with either a '+' or '-' sign...how do I account for these...I've updated the data in my question to show you what I mean and I've updated the code to show you the function I wrote and how I am using the map method. Thanks again! – Graham Streich Jun 18 '17 at 04:53
1

You could do something like this, looping through each value from the CSV in place of stringdate:

stringdate = "2 days 00:00:57.416000"
days_v_hms = string1.split('days')
hms = days_v_hms[1].split(':')
dt = datetime.timedelta(days=int(days_v_hms[0]), hours=int(hms[0]), minutes=int(hms[1]), seconds=float(hms[2]))

Cheers!

zachMade
  • 34
  • 5