4

I am a bit new to R and trying to learn the basics.

I have run into a bit of a problem trying to convert one of my columns into seconds for analysis.

When I try to convert ride_length to numeric it is saying "NAs introduced by coercion" and I cannot seem to change this from character. I am sure there is a way to calculate the travel time using the started_at / ended_at but thought it would be easier to use the ride_length column instead.

glimpse(all_trips)
Rows: 5,595,063
Columns: 15
$ ride_id            <chr> "E19E6F1B8D4C42ED", "DC88F20C2C55F27F", "EC45C94683FE3F27", "4FA453A75AE377DB", "BE5E8EB4E72~
$ rideable_type      <chr> "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electr~
$ started_at         <chr> "1/23/2021 16:14", "1/27/2021 18:43", "1/21/2021 22:35", "1/7/2021 13:31", "1/23/2021 2:24",~
$ ended_at           <chr> "1/23/2021 16:24", "1/27/2021 18:47", "1/21/2021 22:37", "1/7/2021 13:42", "1/23/2021 2:24",~
$ start_station_name <chr> "California Ave & Cortez St", "California Ave & Cortez St", "California Ave & Cortez St", "C~
$ start_station_id   <chr> "17660", "17660", "17660", "17660", "17660", "17660", "17660", "17660", "17660", "17660", "1~
$ end_station_name   <chr> "", "", "", "", "", "", "", "", "", "Wood St & Augusta Blvd", "California Ave & North Ave", ~
$ end_station_id     <chr> "", "", "", "", "", "", "", "", "", "657", "13258", "657", "657", "657", "KA1504000135", "KA~
$ member_casual      <chr> "member", "member", "member", "member", "casual", "casual", "member", "member", "member", "m~
$ ride_length        <chr> "0:10:25", "0:04:04", "0:01:20", "0:11:42", "0:00:43", "0:53:47", "0:05:35", "0:06:40", "0:0~
$ day_of_week        <int> 7, 4, 5, 5, 7, 7, 2, 5, 7, 1, 7, 7, 7, 1, 6, 3, 7, 4, 6, 1, 2, 5, 2, 6, 7, 5, 7, 1, 2, 2, 5,~
$ date               <date> 2020-01-23, 2020-01-27, 2020-01-21, 2020-01-07, 2020-01-23, 2020-01-09, 2020-01-04, 2020-01~
$ month              <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "0~
$ day                <chr> "23", "27", "21", "07", "23", "09", "04", "14", "09", "24", "23", "09", "09", "24", "22", "0~
$ year               <chr> "2020", "2020", "2020", "2020", "2020", "2020", "2020", "2020", "2020", "2020", "2020", "202~enter code here

> dput(head(all_trips, 10))
structure(list(ride_id = c("E19E6F1B8D4C42ED", "DC88F20C2C55F27F", 
"EC45C94683FE3F27", "4FA453A75AE377DB", "BE5E8EB4E7263A0B", "5D8969F88C773979", 
"09275CC10F854E9E", "DF7A32A217AEFB14", "C2EFC62379EB716C", "B9F73448DFBE0D45"
), rideable_type = c("electric_bike", "electric_bike", "electric_bike", 
"electric_bike", "electric_bike", "electric_bike", "electric_bike", 
"electric_bike", "electric_bike", "classic_bike"), started_at = c("1/23/2021 16:14", 
"1/27/2021 18:43", "1/21/2021 22:35", "1/7/2021 13:31", "1/23/2021 2:24", 
"1/9/2021 14:24", "1/4/2021 5:05", "1/14/2021 15:07", "1/9/2021 9:57", 
"1/24/2021 19:15"), ended_at = c("1/23/2021 16:24", "1/27/2021 18:47", 
"1/21/2021 22:37", "1/7/2021 13:42", "1/23/2021 2:24", "1/9/2021 15:17", 
"1/4/2021 5:10", "1/14/2021 15:13", "1/9/2021 10:00", "1/24/2021 19:22"
), start_station_name = c("California Ave & Cortez St", "California Ave & Cortez St", 
"California Ave & Cortez St", "California Ave & Cortez St", "California Ave & Cortez St", 
"California Ave & Cortez St", "California Ave & Cortez St", "California Ave & Cortez St", 
"California Ave & Cortez St", "California Ave & Cortez St"), 
    start_station_id = c("17660", "17660", "17660", "17660", 
    "17660", "17660", "17660", "17660", "17660", "17660"), end_station_name = c("", 
    "", "", "", "", "", "", "", "", "Wood St & Augusta Blvd"), 
    end_station_id = c("", "", "", "", "", "", "", "", "", "657"
    ), member_casual = c("member", "member", "member", "member", 
    "casual", "casual", "member", "member", "member", "member"
    ), ride_length = c("0:10:25", "0:04:04", "0:01:20", "0:11:42", 
    "0:00:43", "0:53:47", "0:05:35", "0:06:40", "0:02:31", "0:07:13"
    ), day_of_week = c(7L, 4L, 5L, 5L, 7L, 7L, 2L, 5L, 7L, 1L
    ), date = structure(c(18284, 18288, 18282, 18268, 18284, 
    18270, 18265, 18275, 18270, 18285), class = "Date"), month = c("01", 
    "01", "01", "01", "01", "01", "01", "01", "01", "01"), day = c("23", 
    "27", "21", "07", "23", "09", "04", "14", "09", "24"), year = c("2020", 
    "2020", "2020", "2020", "2020", "2020", "2020", "2020", "2020", 
    "2020")), row.names = c(NA, 10L), class = "data.frame")
phalteman
  • 3,442
  • 1
  • 29
  • 46
memshark
  • 43
  • 4
  • 1
    Welcome to SO, memshark! There's nothing we can do without seeing your data. Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Mar 15 '22 at 20:03
  • 1
    Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. – r2evans Mar 15 '22 at 20:03
  • 1
    It's not clear how we can help you without seeing the format of your data. Could you edit your question to include the result of `dput(head(all_trips, 10))`? Although almost any character string representing a date-time can be parsed into actual date-time objects, the code to do it depends directly on the format of your columns, so without this we would be guessing. – Allan Cameron Mar 15 '22 at 20:06
  • 2
    I apologize for the confusion and appreciate the feedback. I added the data in place of the images I used originally. I hope it helps to clarify my question. – memshark Mar 15 '22 at 20:59

1 Answers1

3

If I understand your requirement, I think this should work:

library(tidyverse)
library(lubridate)
dt %>% 
  pull(ride_length) %>%
  hms() %>%
  as.numeric()

where dt is your data frame.

Using the dput you provided, this results in:

 [1]  625  244   80  702   43 3227  335  400  151  433

where your ride_length vector has these values:

[1] "0:10:25" "0:04:04" "0:01:20" "0:11:42" "0:00:43" "0:53:47" "0:05:35" "0:06:40" "0:02:31" "0:07:13"
Robert Long
  • 5,722
  • 5
  • 29
  • 50