Is there a tidyr::extract equivalent for character vectors?

Question

I was pondering on this after having come across another question.

library(tidyverse)

set.seed(42)
df <- data.frame(x = cut(runif(100), c(0,25,75,125,175,225,299)))

tidyr::extract does a nice job splitting into groups defined by the regex:

df %>%
  extract(x, c("start", "end"), "(\\d+),(\\d+)") %>% head
#>   start end
#> 1     0  25
#> 2     0  25
#> 3     0  25
#> 4     0  25
#> 5     0  25
#> 6     0  25

Desired output on a character vector. I know you could just create a new function, I wondered if this is already out there.

x_chr <- as.character(df$x)
des_res <- str_split(str_extract(x_chr, "(\\d+),(\\d+)"), ",") 

head(des_res)
#> [[1]]
#> [1] "0"  "25"
#> 
#> [[2]]
#> [1] "0"  "25"
#> 
#> [[3]]
#> [1] "0"  "25"
#> 
#> [[4]]
#> [1] "0"  "25"
#> 
#> [[5]]
#> [1] "0"  "25"
#> 
#> [[6]]
#> [1] "0"  "25"

You could do it by `str_extract_all()`: `str_extract_all(x_chr, "\\d+")`. — tmfmnk, May 11 '21 at 07:16
this only works in this example - I'd like a way to actually split into defined groups. — tjebo, May 11 '21 at 07:17

Ronak Shah · Accepted Answer · 2021-05-11T07:28:56.990

5

You can use strcapture in base R :

strcapture("(\\d+),(\\d+)", x_chr, 
           proto = list(start = numeric(), end = numeric()))

#    start end
#1       0  25
#2       0  25
#3       0  25
#4       0  25
#5       0  25
#6       0  25
#...
#...

You can also use stringr::str_match :

stringr::str_match(x_chr, "(\\d+),(\\d+)")[, -1]

In str_match, 1st column returns the complete pattern whereas all the subsequent columns are the capture groups.

edited May 11 '21 at 07:28

answered May 11 '21 at 07:24

Ronak Shah

377,200
20
156
213

1

nice, I definitely didn’t know that:) do you know of a stringi or stringr equivalent? don’t worry if not – tjebo May 11 '21 at 07:26
1

Yes, you can use `str_match`/`stri_match`. Updated the answer. – Ronak Shah May 11 '21 at 07:30

Is there a tidyr::extract equivalent for character vectors?

1 Answers1