0

I was looking at JSON data that was just in a text file. I don't want to do anything aside from just use regex to get the values in between quotes. I'm just using this as a way to help practice regex and got to this point that seems like it should be simple, but it turns out it's not (at least to me and a few other people at the office). I've matched complicated urls with ease in regex so I'm not completely new to regex. This just seems like a weird case for me.

I've tried:

/(?:")(.*?)(?:")/

/"(.*?)"/

and several others but these got me the closest.

Basically we can forget that it's JSON and just say I want to match the words value and stuff out of "value" and "stuff". Everything I try includes the quotes, so I'd have to clean the strings afterwards of the delimiters or else the string is literally "value" with the quotes.

Any help would be much appreciated, whether this is simple or complicated, I'd love to know! Thanks

Update: Alright so I think I'll go with (?<=")(.*?)(?=") and read things by line without the global setting on so I just get the first match on each line. In my code I was just plopping in a huge string into a var in the code instead of actually opening a file with ajax/filereader or having a form setup to input data. I think I'll mark this as solved, much appreciated!

  • Can you show the code you are using the get the results out? Are you using the capturing groups, or the whole match, to get the text matched? Given your regexes, the text you want should be in capturing group 1, while the whole matched text will always included the quotes too. [demo](https://regex101.com/r/rNuvap/2) where group 1 contains what you want. – joanis Aug 31 '19 at 20:12
  • Besides using capturing groups, you could you zero-width assertions: `(?<=")(.*?)(?=")` [demo](https://regex101.com/r/rNuvap/1) where you'll notice the zero-width assertions mean quotes are not consumed and therefore we match between every quote instead of by pairs. – joanis Aug 31 '19 at 20:14
  • It seems like `(?<=")(.*?)(?=")` works pretty well. Is it possible to make it match just the first value that is between quotes on every line? Basically I want the key and not the value out of the key:value pair if you look at it from a JSON perspective. – Garfield910 Aug 31 '19 at 20:26
  • If you're processing your text file one line at a time, I think you'll just get the first match on each line, unless you ask for repeated matching on each line. – joanis Aug 31 '19 at 20:28

2 Answers2

1

You have two choices to solve this problem:

Use capturing groups

You can match the delimiters and use capturing groups to get the text within. In this case your two regexes will work, but you need to use access capturing group 1 to get the results (demo). See How do you access the matched groups in a JavaScript regular expression? for how to do that.

Use zero-width assertions

You can use zero-width assertions to match only the text within, require delimiters around them without actually matching them (demo):

(?<=")(.*?)(?=")

but now since I'm not consuming the quotes it'll find instances between each quote, not just between pairs of quotes: e.g., a"b"c" would find b and c.

As for getting just the first match, I think that'll happen by default in JavaScript. You'd have to ask for repeated matching before you see the subsequent ones. So if you process your file one line at a time, you should get what you want.

Community
  • 1
  • 1
joanis
  • 10,635
  • 14
  • 30
  • 40
  • Where is the lookbehind documented? I can't find that anywhere in the mozilla or w3c references. Only the lookahead. – Holli Aug 31 '19 at 21:08
  • Lookbehind is a more recent regex feature, you'd have to look at recent documentation. It's in many languages now, though possible not in all yet. I know old versions of Javascript did not support it. – joanis Aug 31 '19 at 21:46
  • @Garfield910: Lookbehind is not supported by javascript except in chrome. – Toto Sep 01 '19 at 14:16
  • @Toto Hum, that will be a problem then. – joanis Sep 01 '19 at 15:37
0

get the values in between quotes

One thing to keep in mind is that valid JSON accepts escaped quotes inside the quoted values. Therefore, the RegEx should take this into account when capturing the groups which is done with the “unrolling-the-loop” pattern.

var pattern = /"[^"\\]*(?:\\.[^"\\]*)*"/g;
var data = {
  "value": "This is \"stuff\".",
  "empty": "",
  "null": null,
  "number": 50
};
var dataString = JSON.stringify(data);
console.log(dataString);
var matched = dataString.match(pattern);
matched.map(item => console.log(JSON.parse(item)));
ctaleck
  • 1,658
  • 11
  • 20