-4

I have a field called Usage.hours in dataframe which was actually a factor type and then I converted it to character type and now I want to convert it to numeric type so that I can perform some aggregate functions on it but I am unable to do so. While trying to do its introducing NAs.

unisim_usage_hours_may_2020$Usage.Hours <- as.numeric(unisim_usage_hours_may_2020$Usage.Hours)

Warning message:

NAs introduced by coercion

Some values for the fields are 00:00:00,11:34:20 etc.

I tried using hms package and although it did convert the field to time format(hh:mm:ss) but it introduced NAs as well because some fields are having values as 138:20:36,210:45:12 etc, so wherever it finds hours in 3 digits it is introducing NA there.

halfer
  • 19,824
  • 17
  • 99
  • 186
  • Hello :) I am not sure to understand your question. Dates and times are not of numeric type in R. You need to use `as.Date()` or `as.POSIXct()` to convert the factor or the character strings. Then you can perform calculations. Feel free to post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Paul Jul 08 '20 at 05:57
  • What you show (`11:34:20`) is not a number. It looks like it might be *time*, but it is not a "number" that can be converted with `as.numeric`. What do you expect `11:34:20` to convert into? – r2evans Jul 08 '20 at 06:11
  • How do i convert it to hh:mi:ss only as couple of functions are adding date also which i dont want. – Ritwik Mohapatra Jul 08 '20 at 06:25
  • That is *still* not a number. What do you expect to do with `hh:mi:ss`? – r2evans Jul 08 '20 at 06:31
  • (I should clarify: it is not a number *for base R functions*. My answer suggests the use of the `lubridate` package, which does have the concept of a "time" without a day/date. I've not dealt with it, so that's all I know there. For anything else, you will likely need to demonstrate what you will eventually be doing with this data.) – r2evans Jul 08 '20 at 06:39

1 Answers1

1

This is a guess at what you ultimately need: fractional hours. I think you want 0 and around 11.57 for those two times shown. I'm going to infer that they aren't meant to be time-of-day, just "time spent" (in hours:minutes:seconds).

Two techniques for parsing this:

  1. Here's a trick that relies on R's use of 1970-01-01 as "origin of unix epoch time".

    as.POSIXct(paste("1970-01-01", unisim_usage_hours_may_2020$Usage.Hours), tz = "UTC")
    # [1] "1970-01-01 00:00:00 UTC" "1970-01-01 11:34:20 UTC"
    as.numeric(as.POSIXct(paste("1970-01-01", unisim_usage_hours_may_2020$Usage.Hours), tz = "UTC"))
    # [1]     0 41660
    as.numeric(as.POSIXct(paste("1970-01-01", unisim_usage_hours_may_2020$Usage.Hours), tz = "UTC")) / 3600
    # [1]  0.00000 11.57222
    
  2. Parse it manually, assuming that there aren't any surprised ...

    strsplit(unisim_usage_hours_may_2020$Usage.Hours, ":")
    # [[1]]
    # [1] "00" "00" "00"
    # [[2]]
    # [1] "11" "34" "20"
    sapply(strsplit(unisim_usage_hours_may_2020$Usage.Hours, ":"), function(a) sum(as.numeric(a) / c(1, 60, 3600)))
    # [1]  0.00000 11.57222
    

Use whichever makes the most sense to you, since you're the one maintaining your code. (If performance matters to you, then the second is faster only with very small datasets; on my machine, the first can take half the time with only a few thousand rows of data. Granted, we're talking half-time measured in microseconds, so it's not cosmic, but if you have "large-ish" data then ... *shrug*.)

Another option (though I don't know what you're meaning to do with this):

  1. Use the lubridate package to convert it into a lubridate-proprietary class:

    lubridate::hms(unisim_usage_hours_may_2020$Usage.Hours)
    # [1] "0S"          "11H 34M 20S"
    

    Over to you with what to do with those ... I am not well-versed in its use.


Data:

unisim_usage_hours_may_2020 <- data.frame(Usage.Hours = c("00:00:00", "11:34:20"), stringsAsFactors = FALSE)
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • i also have value like 138:24:42,230:36:12 etc as usage hours.so all the above functions and other things which i have tried don't help since %H format is only used for 2 digit hour format. – Ritwik Mohapatra Jul 08 '20 at 11:50
  • *"so all the above functions ... don't help"* ... **wrong**. (1) It's important to show relevant examples in your question, I think these (at least one of them) are perfectly relevant examples to demonstrate that you are not dealing with time-of-day. The first few comments (before your question was closed) should have triggered you to suggest better examples like what you just gave me. (2) My #2 above worked on these additional samples, returning `138.4117` and `230.6033`. What (in R terms) are you expecting? And I say again, how are you needing to use these values? – r2evans Jul 08 '20 at 15:37