1

I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.

When the timedeltas are small, I get results that are off by 2 days, e.g.:

> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')

Time difference of 24.20389 days

When they are larger, it doesn't work at all:

> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
Jeremy
  • 1,960
  • 4
  • 21
  • 42
  • See http://stackoverflow.com/questions/12649641/calculating-time-difference-in-r. In short, you can't do date/time math with strings, as you're attempting here. – tluh Aug 02 '16 at 20:12
  • From `?strptime`, `%d` is "*Day of the month as decimal number (01–31).*", not a number of days. I don't have a solution, but this explains the behavior somewhat (more than 31 days results in `NA`) – Gregor Thomas Aug 02 '16 at 20:12
  • @tluh It's not doing math, just coercing to a `difftime` object. – Gregor Thomas Aug 02 '16 at 20:18
  • 1
    You might need to do a little parsing - extract the days as a string, convert to numeric, and add it to the result of using `as.difftime()` on the H:M:S part. – Gregor Thomas Aug 02 '16 at 20:20
  • @Gregor That's good advice. I'll go down that route. Thanks. – Jeremy Aug 02 '16 at 20:27

1 Answers1

1

I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).

My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.

# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds 

That should allow you to do computations with the time differences. Hope that helps.

Muon
  • 1,294
  • 1
  • 9
  • 31