How to extract characters after a match from a string in r?

Question

I have a variable in a data frame that contains raw json text. Some observations have a set 14 digit number that I want to extract and some don't. If the observation has the information it is under this format:

{"blur": "10010010010010"

I want to extract the 14 digits after {"blur": " if there is a match for this left-hand side part of the string. I tried str_extract but my regex syntax is not the best, any suggestions here?

What about using a JSON parser? [Parse JSON with R](https://stackoverflow.com/questions/2061897/parse-json-with-r) — ctwheels, Mar 19 '18 at 13:59
Why not just use a positive lookbehind ? `(?<=\{"blur": ")(\d+)` ? — Zenoo, Mar 19 '18 at 14:01
use `dput(head(R_object))` to display the R object. The JSON text you posted looks like the "blur" and the number are separate, so how you entered the file into R may affect what is the correct pattern to use. — IRTFM, Mar 19 '18 at 14:04

G. Grothendieck · Answer 1 · 2018-03-22T13:46:53.883

If it's fully formed JSON you could use a JSON parser but assuming

it's just fragments as shown in the question or it is fully formed and you prefer to use regular expressions anyways
each input has 0 or 1 occurrences of the digit string
if 0 occurrences then use NA

then try this.

The second argument to strapply is the regular expression. It returns the portion matched to the capture group, i.e. the part of the regular expression within parentheses. The empty=NA argument tells it what to return if no occurrences are found.

library(gsubfn)
s <- c('{"blur": "10010010010010"', 'abc') # test input

strapply(s, '{"blur": "(\\d+)"', empty = NA, simplify = TRUE)
## [1] "10010010010010" NA

How to extract characters after a match from a string in r?

1 Answers1