How to mutate one column with `stringr` to select just some text between certain characters?

Question

I have some data in this format:

#> # A tibble: 2 × 2
#>   record id                                                           
#>    <int> <chr>                                                        
#> 1      1 "<a href=\"https://www.example.com/dir1/dir2/8379\">8379</a>"
#> 2      2 "<a href=\"https://www.example.com/dir1/dir2/8179\">8179</a>"

I would like to use stringr to be left with just the part of the string between ">" and "<".

So my desired output would be:

#> # A tibble: 2 × 2
#>   record id                                                           
#>    <int> <chr>                                                        
#> 1      1 "8379"
#> 2      2 "8179"

I have tried using str_match:

str_match(df$id, pattern = ">(....)<")

and the second column is what I'm after:

#>      [,1]     [,2]  
#> [1,] ">8379<" "8379"
#> [2,] ">8179<" "8179"

How do I know use it in say a mutate command to change a column in the dataframe?

Tidyverse solutions preferred, but open to all answers.

Code for data entry below.

library(tidyverse)
df <-  tibble::tribble(
  ~record,                                                           ~id,
       1L, "<a href=\"https://www.example.com/dir1/dir2/8379\">8379</a>",
       2L, "<a href=\"https://www.example.com/dir1/dir2/8179\">8179</a>"
  )
df

str_match(df$id, pattern = ">(....)<")

Do you need `df %>% muatate(id1 = str_match(df$id, pattern = ">(....)<")[,2])` — Nad Pat, Mar 17 '22 at 06:34
Or `sub("^.*>(\\d+)<.*$", "\\1", df$id)`. But this is not a `stringr` solution. — Rui Barradas, Mar 17 '22 at 06:53

score 0 · Answer 1 · answered Jun 20 '23 at 09:00

You can use str_extract() with a regex. Use a lookbehind to look for the character(s) behind the text you're looking for, and a lookahead for the character(s) ahead of it. The code:

df %>%
  mutate(id = str_extract(id, "(?<=\\>)(.*)(?=\\<)"))

#   record   id   
#   <dbl> <chr>
# 1      1 8379 
# 2      2 8179

How to mutate one column with `stringr` to select just some text between certain characters?

1 Answers1