0

The list of original inputs are a list of free text field. The task is to extract a pattern like "234-5678" from the string.

For example the list in the following:

text <- c("abced 156-8790","kien 3578-562839 bewsd","$nietl 66320-98703","789-55340")

what I would like to extract is:

return <- c("156-8790","578-5628","320-9870","789-5534")

I was considering to use gsub("^[([:digit:]{3})[-]([:digit:]{4})]", replacement = "", text), but the regex does not work the way I wanted. Could anyone please help with this? Many thanks in advance!

wp78de
  • 18,207
  • 7
  • 43
  • 71
Anne
  • 59
  • 6

1 Answers1

1

We can use str_extract to match 3 digits (\\d{3}) followed by a - , followed 4 digits (\\d{4})

library(stringr)
str_extract(text, "\\d{3}-\\d{4}")
#[1] "156-8790" "578-5628" "320-9870" "789-5534"

Or using base R with regmatches/regexpr

regmatches(text, regexpr("\\d{3}-\\d{4}", text))
#[1] "156-8790" "578-5628" "320-9870" "789-5534"
akrun
  • 874,273
  • 37
  • 540
  • 662