0
await page.on("response", async (response) => {
        const request = await response.request();
        if (
          request.url().includes("https://www.jobs.abbott/us/en/search-results")
        ) {
          const text = await response.text();
          const root = await parse(text);
          root.querySelectorAll("script").map(async function (n) {
            if (n.rawText.includes("eagerLoadRefineSearch")) {
             const text = await n.rawText.match(
                 /"eagerLoadRefineSearch":(\{.*\})\,/,
               );
               const refinedtext = await text[0].match(/\[{.*}\]/);
              //console.log(refinedtext);
              console.log(JSON.parse(refinedtext[0]));
               }
          });
        }
      });

In the snippet I have posted a data which is in text format I want to extract eagerLoadRefineSearch : { (and its content too)} as a text with regex and perform json.parse on extracted text so that i get finally a json object of "eagerLoadRefineSearch" : {}.

I am using puppetter for intercepting response. I just want a correct regex which can get me whole object text of "eagerLoadRefineSearch" : {} (with its content).

I am sharing the response text from the server in this link https://codeshare.io/bvjzJA .

I want to extract "eagerLoadRefineSearch" : {} from the data which is in text format in this https://codeshare.io/bvjzJA

furas
  • 134,197
  • 12
  • 106
  • 148
ShravanKaja
  • 11
  • 1
  • 2
  • I don't know if `regex` can get it. It needs to count `{` and `}` - it needs the same number of `{` and `}` – furas Mar 18 '22 at 07:01
  • But I guess regex can match a pattern where we allow any nested objects, like "eagerLoadRefineSearch" : { "jobs" : [ ],status : 200,hits:500} this object. this /"eagerLoadRefineSearch":(\{.*\})\,/ regex is also getting other text which is next to eagerLoadRefineSearch. But it is starting at correct point – ShravanKaja Mar 18 '22 at 07:16

1 Answers1

2

Context

Silly mistakes

The text you are parsing has no flanked " around eagerLoadRefineSearch. Now the object to match spans across several lines thus m flag is required. Also . does not match new line so the alternative is to use [\s\S]. Refer to how-to-use-javascript-regex-over-multiple-lines.

Also also, don't use await on string method match.

Matching the closing brace

Quick search on this topic lead me to this link and as I suspected, this is complicated. To ease this problem I made this assumption that the text is correctly indented. We can match on the indentation level to find the closing brace with this pattern.

/(?<indent>[\s]+)\{[\s\S]+\k<indent>\}/gm

This works if the both the opening and the closing braces are at the same level of indentation. They are not in our case since eagerLoadRefineSearch: is between the indent and opening brace but we can account for this.

const reMatchObject = /(?<indent>[\s]+)eagerLoadRefineSearch: \{[\s\S]+?\k<indent>\}/gm

Valid JSON

As metioned earlier the keys lack flanking double quotes so lets replace all keys with "key"s.

const reMatchKeys = /(\w+):/gm
const impure = 'hello: { name: "nammu", age: 18, subjects: { first: "english", second: "mythology"}}'
const pure = impure.replace(reMatchKeys, '"$1":')
console.log(pure)

Then we get rid of the trailing commas. Here's the regex that worked for this example.

const reMatchTrailingCommas = /,(?=\s+[\]\}])/gm

Once we pipe these replace functions, the data is good to use by JSON.parse.

Code

await page.on('response', async (response) => {
  const request = await response.request();
  if (
    request
      .url()
      .includes('https://www.jobs.abbott/us/en/search-results')
  ) {
    const text = await response.text();
    const root = await parse(text);
    root.querySelectorAll('script').map(async function (n) {
      const data = n.rawText;
      if (data.includes('eagerLoadRefineSearch')) {
        const reMatchObject = /(?<indent>[\s]+)eagerLoadRefineSearch: \{[\s\S]+?\k<indent>\}/gm;
        const reMatchKeys = /(\w+):\s/g;
        const reMatchTrailingCommas = /,(?=\s+[\]\}])/gm;
        const parsedStringArray = data.toString().match(reMatchObject);
        for (const parsed of parsedStringArray) {
          const noTrailingCommas = parsed.replace(reMatchTrailingCommas, '');
          const validJSONString = '{' + noTrailingCommas.replace(reMatchKeys, '"$1":') + '}';
          console.log(JSON.parse(validJSONString));
        }
      }
    });
  }
});
Nikhil Devadiga
  • 428
  • 2
  • 9