7

I need to find the number after the string "Count of". There could be a space or a symbol between the "Count of" string and the number. I have something that works on www.regex101.com but does not work with stringr str_extract function.

library(stringr)

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2", "monkey coconut 3oz count of 5", "monkey coconut count of 50", "chicken Count Of-10")
str_extract(shopping_list, "count of ([\\d]+)")
[1] NA NA NA NA "count of 5" "count of 50" NA

What I want to get:

[1] NA NA NA NA "5" "50" "10"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Matthew Crews
  • 4,105
  • 7
  • 33
  • 57

3 Answers3

6
str_extract(shopping_list, "(?i)(?<=count of\\D)\\d+")
# [1] NA   NA   NA   NA   "5"  "50" "10"

where (?i) makes the pattern case insensitive, \\D means not a number, and ?<= is a positive lookbehind.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • I was thinking of a lookbehind but if the data has a slight variation it will fail. Try `"coconut count of - 5"` – Pierre L Mar 11 '16 at 18:30
  • @PierreLafortune, true, but in this case I understand that there can be only one symbol between "f" and a number. – Julius Vainora Mar 11 '16 at 18:41
  • This is the query that is providing the value that I need using just the regex. Thank you! – Matthew Crews Mar 11 '16 at 19:00
  • @Julius is it possible to make the `\\D` any number of non-numeric, non-character variable? – Matthew Crews Mar 11 '16 at 19:08
  • @MatthewCrews, unfortunately, as far as I know, positive lookbehind allows only for a text of predefined length, so no (if we want to maintain the same spirit as in this answer). That is what Pierre Lafortune had in mind is his comment. – Julius Vainora Mar 11 '16 at 19:33
  • @Julius, thank you for the information. The reason I did not accept Pierre's answer was because it was returning text from every element of the vector and then depended on the `as.numeric` function to eliminate the elements which did not have a value of interest. This answer actually extracts the element of interest using just RegEx. – Matthew Crews Mar 11 '16 at 19:38
3

Look ahead and look behinds are what you are looking for with this grep...

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2", "monkey coconut 3oz count of 5", "monkey coconut count of 50", "chicken Count Of-10")
str_extract(shopping_list, "(?<=count of )[0-9]*")
[1] NA   NA   NA   NA   "5"  "50" NA  
cory
  • 6,529
  • 3
  • 21
  • 41
2
as.numeric(sub("(?i).*count of.*?(\\d+).*", "\\1", shopping_list))
[1] NA NA NA NA  5 50 10

The regex pattern is:

  • (?i): Ignore case
  • .*count of.*?: Any length of characters up to "count of"
  • (\\d+): Capture one or more digits
  • "\\1": Return the capture group

As of now the other answers will fail with something like ""coconut count of - 5" since they are constrained by one space after "count of".

Pierre L
  • 28,203
  • 6
  • 47
  • 69