0

I have a variable in a data frame that contains raw json text. Some observations have a set 14 digit number that I want to extract and some don't. If the observation has the information it is under this format:

{"blur": "10010010010010"

I want to extract the 14 digits after {"blur": " if there is a match for this left-hand side part of the string. I tried str_extract but my regex syntax is not the best, any suggestions here?

  • What about using a JSON parser? [Parse JSON with R](https://stackoverflow.com/questions/2061897/parse-json-with-r) – ctwheels Mar 19 '18 at 13:59
  • Why not just use a positive lookbehind ? `(?<=\{"blur": ")(\d+)` ? – Zenoo Mar 19 '18 at 14:01
  • use `dput(head(R_object))` to display the R object. The JSON text you posted looks like the "blur" and the number are separate, so how you entered the file into R may affect what is the correct pattern to use. – IRTFM Mar 19 '18 at 14:04

1 Answers1

1

If it's fully formed JSON you could use a JSON parser but assuming

  • it's just fragments as shown in the question or it is fully formed and you prefer to use regular expressions anyways
  • each input has 0 or 1 occurrences of the digit string
  • if 0 occurrences then use NA

then try this.

The second argument to strapply is the regular expression. It returns the portion matched to the capture group, i.e. the part of the regular expression within parentheses. The empty=NA argument tells it what to return if no occurrences are found.

library(gsubfn)
s <- c('{"blur": "10010010010010"', 'abc') # test input

strapply(s, '{"blur": "(\\d+)"', empty = NA, simplify = TRUE)
## [1] "10010010010010" NA 
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341