1

I have a function in Javascript to match text between curly braces (including the braces) in order to extract a JSON string from different posts. The function is below:

function eventObject(value) {
   var json = text.match(/{([^}]+)}/)[0]; // matches content between curly braces
   return json;
}

The input looks like this:

{ 
 "host": "Host link..",
 "info": "Info text...",
 "links": [ {"title": "link_title", "url": "link_url"}, {"title": "link_title_2", "url": "link_url_2"} ],
 "category": "campfire"
}

The problem is this text contains a nested string with more curly brackets. The output cuts as soon as it gets to the first occurrence of a closing bracket in the links. How can I prevent this from happening in order to get the full string?


Update

I realised I left out some important information to simplify my question: the API response is a string of raw html that contains the string I would like to parse as an object. The typical raw HTML looks like this:

"cooked":"<pre><code class=\"lang-auto\">{\n\"host\": \"host_link\",\n\"info\": \"event_info\",\n\"links\": [{\"title\": \"link_title \", \"url\": \"link_url"},{\"title\": \"link_two_title \", \"url\": \"link_two_url\"} ],\n\"category\": \"category\"\n}\n</code></pre>"}

The challenge is extracting the entire string between <code></code> and parsing it into an object. I have updated the title of the question to reflect this.

The following function successfully extracts the string and strips the html tags and line breaks, but it does not correctly parse it as an object:

function eventObject(value){

  const doc = new DOMParser().parseFromString(value, "text/html");
  var json = [...doc.querySelectorAll('code')].map(code => code.textContent); // DOMParser extracts text between <code> tags
  var final = String(json).replace(/\n/g, " ").replace(/[\u2018\u2019]/g, "'").replace(/[\u201C\u201D]/g, '"'); // removes line breaks and replaces curly quotes with straight quotes
  var string = JSON.stringify(final);
  var obj = JSON.parse("'" + string + "'");
  return obj;

}
ogot
  • 341
  • 2
  • 17
  • 1
    JSON and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable. – Toto Mar 23 '20 at 16:17
  • @Toto that reminds me of this answer https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags :D – Guerric P Mar 23 '20 at 16:26
  • 1. Use a HTML parser. 2. Use a JSON parser. – Toto Mar 23 '20 at 17:18
  • @Toto html parser works, in the function I have put inside the update. JSON parser however does not. – ogot Mar 23 '20 at 17:23

1 Answers1

1

Your function eventObject looks ok, but you don't need JSON.stringify because it is intended for serializing JavaScript objects, whereas you pass a string to it. Try this:

function eventObject(value){

  const doc = new DOMParser().parseFromString(value, "text/html");
  var json = [...doc.querySelectorAll('code')].map(code => code.textContent); // DOMParser extracts text between <code> tags
  var final = String(json).replace(/\n/g, " ").replace(/[\u2018\u2019]/g, "'").replace(/[\u201C\u201D]/g, '"'); // removes line breaks and replaces curly quotes with straight quotes
  // var string = JSON.stringify(final);
  var obj = JSON.parse(final);
  return obj;

}

var value = '"cooked":"<pre><code class=\"lang-auto\">{\n\"host\": \"host_link\",\n\"info\": \"event_info\",\n\"links\": [{\"title\": \"link_title \", \"url\": \"link_url"},{\"title\": \"link_two_title \", \"url\": \"link_two_url\"} ],\n\"category\": \"category\"\n}\n</code></pre>"}';


console.log(eventObject(value))
Kenan Güler
  • 1,868
  • 5
  • 16