Extract array of objects from string in R

Question

Suppose i have the following string:

str <- "var x = 1; var b = [{a:c, d:f}, {aa:cc, dd:ff}]; var notOfInterest = {cc:r, dd:w}"

I would like to extract all objects within that array "[{...}, {...}]" not single objects "{...}" (in this example: var notOfInterest).

Desired Output:

> list(c("{a:c, d:f}", "{aa:cc, dd:ff}"))
[[1]]
[1] "{a:c, d:f}"     "{aa:cc, dd:ff}"

What i tried:

Building on this Question/answer: find json in string with R i tried to add brackets. I tried with adjusted Input string: str2 <- "var x = 1; var b = [{a:c, d:f}]; var notOfInterest = {cc:r, dd:w}" (just for Debugging purposes. str is still my target string). Even that step doesnt work. After that i would have to add an optional comma and allow the object to appear multiple times.

gregexpr(
  pattern = "[\\{(?:[^{}]|(?R))*?\\}]",
  perl = TRUE,
  text = str
) %>%
  regmatches(x = str)

I tried with "fixed = TRUE" Parameter, escaping the brackets and some more options which i am very happy to post the code for, but i guess the question will get too long.

akrun · Answer 1 · 2020-01-04T17:37:47.187

2

We can use gsub with grep

strsplit(gsub(".*\\[|\\]", "", grep("\\},", strsplit(str, ";")[[1]], 
            value = TRUE)), ", (?=\\{)", perl = TRUE)
#[[1]]
#[1] "{a:c, d:f}"     "{aa:cc, dd:ff}"

Another option is rm_square from qdapRegex

library(qdapRegex)
rm_square(str, extract = TRUE)
#[[1]]
#[1] "{a:c, d:f}, {aa:cc, dd:ff}"

edited Jan 04 '20 at 17:37

answered Jan 04 '20 at 16:34

akrun

874,273
37
540
662

score 1 · Answer 2 · answered Jan 04 '20 at 16:38

1

Use gsub and strsplit.

strsplit(gsub("^.*?(\\{.*?\\}).*(\\{.*?\\}).*$", "\\1£\\2", str), "£")
# [[1]]
# [1] "{a:c, d:f}"     "{aa:cc, dd:ff}"

answered Jan 04 '20 at 16:38

jay.sf

60,139
8
53
110

score 1 · Accepted Answer · answered Jan 04 '20 at 16:42

You can do that using stringr package like that;

library(stringr)
str <- "var x = 1; var b = [{a:c, d:f}, {aa:cc, dd:ff}]; var notOfInterest = {cc:r, dd:w}"

To match first occurrence only and result will be vector

str_extract(str, "(?<=\\[).+?(?=\\])")
# [1] "{a:c, d:f}, {aa:cc, dd:ff}"

To get all occurrences and result will be a list

str_extract_all(str, "(?<=\\[).+?(?=\\])")
# [[1]]
# [1] "{a:c, d:f}, {aa:cc, dd:ff}"

score 1 · Answer 4 · answered Jan 04 '20 at 22:13

Here are the other solutions.

For your Debugging purposes

note: use the "[" and "]" need escape to be just text because "[" and "]"are special regex characters in regular expression.

str2 <- "var x = 1; var b = [{a:c, d:f}]; var notOfInterest = {cc:r, dd:w}" 

str2 %>% regexpr(
  pattern = "\\[\\{((\\s|\\S)+)\\}\\]",
  perl = TRUE
) %>%
  regmatches(x = str2)
#[1] "[{a:c, d:f}, {aa:cc, dd:ff}]"

For your main purpose

Use gsub and strsplit
- Extract the array text.
  
  gsub(pattern = "^.+\\[(.+)\\].+$",perl = T, replacement = "\\1")
  - pattern = "^.+\\[(.+)\\].+$" : use "(.+)"make texts between "[" and "]" as a string group . In this example , string group will be "{a:c, d:f}, {aa:cc, dd:ff}".
  - replacement = "\\1" : replace original text with a string group .
- Extract objects in the array.
  
  strsplit(split = "(?<=([\\{\\}]))\\,\\s", perl = T)
  - split = "(?<=([\\{\\}]))\\,\\s" : split the array by the ", " between "}" and "{".

str <- "var x = 1; var b = [{a:c, d:f}, {aa:cc, dd:ff}]; var notOfInterest = {cc:r, dd:w}"

str %>% gsub(pattern = "^.+\\[(.+)\\].+$",
             perl = T,
             replacement = "\\1") %>% strsplit(split = "(?<=([\\{\\}]))\\,\\s", perl = T)
# [[1]]
# [1] "{a:c, d:f}"     "{aa:cc, dd:ff}"

I hope it will help you :)

Extract array of objects from string in R

4 Answers4