0

I am working R. I want to extract all numbers between the last blank space and a string pattern ("-APPLE") in a vector. The numbers can be of variable length.

test_string = c("ABC 2-APPLE", "123 25-APPLE", "DEF GHI 567-APPLE", "ORANGE")

Expected Result set should be a vector as in c(2, 25, 567, NA)

Prabha
  • 21
  • 3

2 Answers2

1

See Regex group capture in R with multiple capture-groups for an example of using str_match(), from the stringr package.

In your case:

> test_string = c("ABC 2-APPLE", "123 25-APPLE", "DEF GHI 567-APPLE")
> 
> library(stringr)
> x <- str_match(test_string, " ([0-9]+)-APPLE$")[,2]
> as.numeric(x)
[1]   2  25 567
Joe
  • 29,416
  • 12
  • 68
  • 88
  • Thanks, this is helpful. Two followup questions: 1.Can you help me understand the the significance of the $ character? 2. why are there 2 columns in output of str_match function (i.e. in object x)? – Prabha Mar 30 '19 at 05:22
  • 1. [What does the $ (dollar) sign in a regex pattern mean?](https://stackoverflow.com/questions/23877408/what-does-the-dollar-sign-in-a-regex-pattern-mean) 2. [`str_match`](https://www.rdocumentation.org/packages/stringr/versions/1.4.0/topics/str_match) "First column is the complete match, followed by one column for each capture group." – Joe Mar 30 '19 at 06:38
1

you can use the "rebus" package, which is very user-friendly in creating the regex patterns you need.

library(rebus)
## adjust the lo and hi arguments of dgt() based on your text

rx <- lookbehind(SPACE) %R% dgt(1,5) %R% lookahead("-APPLE")
str_extract(test_string, rx)
ayeh
  • 48
  • 10