0

Suppose i have the following string:

str <- "var x = 1; var b = [{a:c, d:f}, {aa:cc, dd:ff}]; var notOfInterest = {cc:r, dd:w}"

I would like to extract all objects within that array "[{...}, {...}]" not single objects "{...}" (in this example: var notOfInterest).

Desired Output:

> list(c("{a:c, d:f}", "{aa:cc, dd:ff}"))
[[1]]
[1] "{a:c, d:f}"     "{aa:cc, dd:ff}"

What i tried:

Building on this Question/answer: find json in string with R i tried to add brackets. I tried with adjusted Input string: str2 <- "var x = 1; var b = [{a:c, d:f}]; var notOfInterest = {cc:r, dd:w}" (just for Debugging purposes. str is still my target string). Even that step doesnt work. After that i would have to add an optional comma and allow the object to appear multiple times.

gregexpr(
  pattern = "[\\{(?:[^{}]|(?R))*?\\}]",
  perl = TRUE,
  text = str
) %>%
  regmatches(x = str)

I tried with "fixed = TRUE" Parameter, escaping the brackets and some more options which i am very happy to post the code for, but i guess the question will get too long.

Tlatwork
  • 1,445
  • 12
  • 35

4 Answers4

2

We can use gsub with grep

strsplit(gsub(".*\\[|\\]", "", grep("\\},", strsplit(str, ";")[[1]], 
            value = TRUE)), ", (?=\\{)", perl = TRUE)
#[[1]]
#[1] "{a:c, d:f}"     "{aa:cc, dd:ff}"

Another option is rm_square from qdapRegex

library(qdapRegex)
rm_square(str, extract = TRUE)
#[[1]]
#[1] "{a:c, d:f}, {aa:cc, dd:ff}"
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Use gsub and strsplit.

strsplit(gsub("^.*?(\\{.*?\\}).*(\\{.*?\\}).*$", "\\1£\\2", str), "£")
# [[1]]
# [1] "{a:c, d:f}"     "{aa:cc, dd:ff}"
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1

You can do that using stringr package like that;

library(stringr)
str <- "var x = 1; var b = [{a:c, d:f}, {aa:cc, dd:ff}]; var notOfInterest = {cc:r, dd:w}"

To match first occurrence only and result will be vector

str_extract(str, "(?<=\\[).+?(?=\\])")
# [1] "{a:c, d:f}, {aa:cc, dd:ff}"

To get all occurrences and result will be a list

str_extract_all(str, "(?<=\\[).+?(?=\\])")
# [[1]]
# [1] "{a:c, d:f}, {aa:cc, dd:ff}"
Nareman Darwish
  • 1,251
  • 7
  • 14
1

Here are the other solutions.

  • For your Debugging purposes

note: use the "[" and "]" need escape to be just text because "[" and "]"are special regex characters in regular expression.

str2 <- "var x = 1; var b = [{a:c, d:f}]; var notOfInterest = {cc:r, dd:w}" 

str2 %>% regexpr(
  pattern = "\\[\\{((\\s|\\S)+)\\}\\]",
  perl = TRUE
) %>%
  regmatches(x = str2)
#[1] "[{a:c, d:f}, {aa:cc, dd:ff}]"
  • For your main purpose

    Use gsub and strsplit

    • Extract the array text.

      gsub(pattern = "^.+\\[(.+)\\].+$",perl = T, replacement = "\\1")

      • pattern = "^.+\\[(.+)\\].+$" : use "(.+)"make texts between "[" and "]" as a string group . In this example , string group will be "{a:c, d:f}, {aa:cc, dd:ff}".

      • replacement = "\\1" : replace original text with a string group .

    • Extract objects in the array.

      strsplit(split = "(?<=([\\{\\}]))\\,\\s", perl = T)

      • split = "(?<=([\\{\\}]))\\,\\s" : split the array by the ", " between "}" and "{".
str <- "var x = 1; var b = [{a:c, d:f}, {aa:cc, dd:ff}]; var notOfInterest = {cc:r, dd:w}"

str %>% gsub(pattern = "^.+\\[(.+)\\].+$",
             perl = T,
             replacement = "\\1") %>% strsplit(split = "(?<=([\\{\\}]))\\,\\s", perl = T)
# [[1]]
# [1] "{a:c, d:f}"     "{aa:cc, dd:ff}"

I hope it will help you :)

Hsiang Yun Chan
  • 141
  • 2
  • 4