0

How can I transform a column of characters written as

c("0 y", "0 m", "23 d", "0 y",  "0 m", "8 d")

into number values

c(0, 0, 23, 0, 0, 0)

example of what I'm talking about

enter image description here

another example that has some single-digit dates

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
bdg67
  • 33
  • 7
  • If you have `1 y 2 m 8d` etc. what value would be for year – akrun Mar 07 '20 at 21:53
  • 1
    Try `gsub("^.*(\\d+) d.*$", "\\1", x)` where x is you vector of strings – Allan Cameron Mar 07 '20 at 21:56
  • 1
    It is almost always preferable to add code and data as formatted text to the question than to use pictures of code and data. Also a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) is always weclome! Although in this really simple case it might not be really necessary. – dario Mar 07 '20 at 22:42
  • the answers never include months or years, its a short term follow up value, hope that clears it up – bdg67 Mar 08 '20 at 02:00

3 Answers3

0

Assuming y and m are always 0

Oy.date.diff <- c("0 y, 0 m, 12 d", "0 y, 0 m, 13 d", "0 y, 0 m, 12 d", "0 y, 0 m, 15 d") 
as.numeric(gsub(" d", "", gsub("0 y, 0 m, ", "", Oy.date.diff)))
# [1] 12 13 12 15

Note that R does not allow variables (or columns) to begin with a digit so the first character is uppercase letter O.

akrun
  • 874,273
  • 37
  • 540
  • 662
dcarlson
  • 10,936
  • 2
  • 15
  • 18
0

We can use sub to capture the digits before the space followed by 'd'

as.integer(sub(".*\\s(\\d+) d", "\\1", v1))
#[[1] 12 13 12 15 12

Or with regmatches/regexpr

regmatches(v1, regexpr("(\\d+)(?= d$)", v1, perl = TRUE))
#[1] "12" "13" "12" "15" "12"

If we need to convert to all days, then

library(dplyr)
library(tidyr)
tibble(col1 = v1) %>% 
  tidyr::extract(col1, into = c('year', 'month', 'day'),
       "^(\\d+) y, (\\d+) m, (\\d+) d$",  convert = TRUE) %>% 
  transmute(days = year * 365 + month * 30 + day)

data

v1 <- c("0 y, 0 m, 12 d", "0 y, 0 m, 13 d", "0 y, 0 m, 12 d",
        "0 y, 0 m, 15 d", "1 y, 2 m, 12 d")
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You can try this capturing regex with gsub, which captures any numbers before a " d" and doesn't make any assumptions about the rest of the string:

x <- c("0 y, 0 m, 12 d", "0 y, 0 m, 13 d", "0 y, 0 m, 12 d", "0 y, 0 m, 15 d") 
gsub("^.*(\\d+) d.*$", "\\1", x)
#> [1] "2" "3" "2" "5"
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87