extract number after specific string

Question

I need to find the number after the string "Count of". There could be a space or a symbol between the "Count of" string and the number. I have something that works on www.regex101.com but does not work with stringr str_extract function.

library(stringr)

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2", "monkey coconut 3oz count of 5", "monkey coconut count of 50", "chicken Count Of-10")
str_extract(shopping_list, "count of ([\\d]+)")
[1] NA NA NA NA "count of 5" "count of 50" NA

What I want to get:

[1] NA NA NA NA "5" "50" "10"

score 6 · Accepted Answer · answered Mar 11 '16 at 18:25

6

str_extract(shopping_list, "(?i)(?<=count of\\D)\\d+")
# [1] NA   NA   NA   NA   "5"  "50" "10"

where (?i) makes the pattern case insensitive, \\D means not a number, and ?<= is a positive lookbehind.

answered Mar 11 '16 at 18:25

Julius Vainora

47,421
9
90
102

I was thinking of a lookbehind but if the data has a slight variation it will fail. Try `"coconut count of - 5"` – Pierre L Mar 11 '16 at 18:30
@PierreLafortune, true, but in this case I understand that there can be only one symbol between "f" and a number. – Julius Vainora Mar 11 '16 at 18:41
This is the query that is providing the value that I need using just the regex. Thank you! – Matthew Crews Mar 11 '16 at 19:00
@Julius is it possible to make the `\\D` any number of non-numeric, non-character variable? – Matthew Crews Mar 11 '16 at 19:08
@MatthewCrews, unfortunately, as far as I know, positive lookbehind allows only for a text of predefined length, so no (if we want to maintain the same spirit as in this answer). That is what Pierre Lafortune had in mind is his comment. – Julius Vainora Mar 11 '16 at 19:33
@Julius, thank you for the information. The reason I did not accept Pierre's answer was because it was returning text from every element of the vector and then depended on the `as.numeric` function to eliminate the elements which did not have a value of interest. This answer actually extracts the element of interest using just RegEx. – Matthew Crews Mar 11 '16 at 19:38

score 3 · Answer 2 · answered Mar 11 '16 at 18:24

Look ahead and look behinds are what you are looking for with this grep...

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2", "monkey coconut 3oz count of 5", "monkey coconut count of 50", "chicken Count Of-10")
str_extract(shopping_list, "(?<=count of )[0-9]*")
[1] NA   NA   NA   NA   "5"  "50" NA

Pierre L · Answer 3 · 2016-03-11T18:37:03.310

2

as.numeric(sub("(?i).*count of.*?(\\d+).*", "\\1", shopping_list))
[1] NA NA NA NA  5 50 10

The regex pattern is:

(?i): Ignore case
.*count of.*?: Any length of characters up to "count of"
(\\d+): Capture one or more digits
"\\1": Return the capture group

As of now the other answers will fail with something like ""coconut count of - 5" since they are constrained by one space after "count of".

edited Mar 11 '16 at 18:37

answered Mar 11 '16 at 18:24

Pierre L

28,203
6
47
69

extract number after specific string

3 Answers3

Linked

Related