0

I have a file the contents of which are formatted as follows:

{
  "title": "This is a test }",
  "date": "2017-11-16T20:47:16+00:00"
}

This is a test }

I'd like to extract just the JSON heading via regex (which I think is the most sensible approach here). However, in the future the JSON might change, (e.g. it might have extra fields added, or the current fields could change), so I'd like to keep the regex flexible. I have tried with the solution suggested here, however, that seems a bit too simplistic for my use case: in fact, the above "tricks" the regex, as shown in this regex101.com example.

Since my knowledge of regex is not that advanced, I'd like to know whether there's a regex approach that is able to cover my use case.

Thank you!

finferflu
  • 1,368
  • 2
  • 11
  • 28
  • 4
    Why wouldn't you want to parse the JSON? – Paul S. Nov 16 '17 at 21:12
  • what exactly do you want to extract? can you give an input and output examples? Also what programming language are you using? – H H Nov 16 '17 at 21:16
  • Why do you think using regex is more sensible approach than using JSON parser? – anubhava Nov 16 '17 at 21:21
  • No need for regex If it always starts with \n{ and ends with \n} – Slai Nov 16 '17 at 21:28
  • 1
    There is little chance to parse it the right way with regex. You may try to get it with https://regex101.com/r/4Ds3sO/2 though. But Slai's comment is hinting that a non-regex solution might be simpler. – Wiktor Stribiżew Nov 16 '17 at 21:32
  • Why don't you just fix your file so it contains valid JSON instead of JSON with extra stuff after it? – Barmar Nov 16 '17 at 21:58
  • To answer your questions about JSON parsing: I do want to parse the JSON, but first I need to extract it from the file, i.e. I need to separate it from the rest of the content. The JSON here works as the file's heading, containing metadata so I need it to be in this format. – finferflu Nov 16 '17 at 23:51
  • @Slai what if the last line of the file, outside of the JSON heading also ends with `\n}`? – finferflu Nov 16 '17 at 23:52
  • @WiktorStribiżew Thanks for your input! It seems to work so far, but I'll have to test it with more scenarios. – finferflu Nov 16 '17 at 23:53
  • if the JSON is properly indented like in your example, you will need the first \n} – Slai Nov 17 '17 at 00:03
  • @Slai yes, it will always be indented that way, because it's generated by Javascript's `JSON.stringify()`. Could you please elaborate on your solution? Thanks! – finferflu Nov 17 '17 at 00:06

2 Answers2

1

If the JSON always starts with { at the left margin and ends with } at the right margin, with everything else indented as you show, you can use the regular expression

/^{.*?^}$/ms

The m modifier makes ^ and $ match the beginning and end of lines, not the whole string. The s modifier allows . to match newlines.

var str = `{
  "title": "This is a test }",
  "date": "2017-11-16T20:47:16+00:00"
}

This is a test }
`;

var match = str.match(/^{.*?^}$/ms);
if (match) {
  var data = JSON.parse(match[0]);
}
console.log(data);
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thanks for this solution, this I believe, is the same that [@Wiktor Stribiżew](https://stackoverflow.com/users/3832970/wiktor-stribiżew) has suggested in the comment to my original question. I do appreciate the time you took for explaining what each bit does. As I mentioned above, I'll run this regex through a few more case scenarios, and I'll update this question then. – finferflu Nov 16 '17 at 23:59
1

You can check for the first index of \n} to get the sub-string:

s = `{
  "title": "This is a test }",
  "date": "2017-11-16T20:47:16+00:00"
}
This is a test }
}`

i = s.indexOf('\n}')

if (i > 0) {
  o = JSON.parse(s = s.slice(0, i + 2))
  console.log(s); console.log(o)
}

or a bit shorter with RegEx:

s = `{
  "title": "This is a test }",
  "date": "2017-11-16T20:47:16+00:00"
}
This is a test }
}`

s.replace(/.*?\n}/s, function(m) {
  o = JSON.parse(m)
  console.log(m); console.log(o)
})
Slai
  • 22,144
  • 5
  • 45
  • 53
  • This is actually a clever way to approach this problem, and I believe it’s even more foolproof than using a regex. Thanks for your input! – finferflu Nov 17 '17 at 01:52