50

I've a document from which I need to extract some data. Document contain strings like these

Text:"How secure is my information?"

I need to extract text which is in double quotes after the literal Text:

How secure is my information?

How do I do this with regex in Javascript

codaddict
  • 445,704
  • 82
  • 492
  • 529
Raj
  • 4,405
  • 13
  • 59
  • 74

9 Answers9

74

Lookbehind assertions were recently finalised for JavaScript and will be in the next publication of the ECMA-262 specification. They are supported in Chrome 66 (Opera 53), but no other major browsers at the time of writing (caniuse).

var str = 'Text:"How secure is my information?"',
    reg = /(?<=Text:")[^"]+(?=")/;

str.match(reg)[0];
// -> How secure is my information?

Older browsers do not support lookbehind in JavaScript regular expression. You have to use capturing parenthesis for expressions like this one instead:

var str = 'Text:"How secure is my information?"',
    reg = /Text:"([^"]+)"/;

str.match(reg)[1];
// -> How secure is my information?

This will not cover all the lookbehind assertion use cases, however.

x-yuri
  • 16,722
  • 15
  • 114
  • 161
Andy E
  • 338,112
  • 86
  • 474
  • 445
  • 3
    But how do I extract all such data from a large document? Say into an array or something? – Raj Aug 25 '10 at 19:00
  • You use the /g modifier like what I had in my answer. – CrayonViolent Aug 25 '10 at 22:57
  • @Raj: you can use the `/g` modifier, as Crayon Violent said. This is the global modifier, without it the regular expression will stop execution when it finds the first match, with it the regular expression continues until it finds all matches, returning an array with all the matches in it. – Andy E Aug 26 '10 at 08:21
  • 1
    Hm. I get an undefined when I test the above code. When I console the `str.match(reg)` without index 1, it outputs `["Text:"How secure is my information?""]` thoughts? – jmk2142 Jun 19 '13 at 14:42
  • 1
    @orangewarp: the `g` modifier needed removing for the example in my answer. – Andy E Jun 19 '13 at 14:50
  • 5
    Cool. Works. Why does the **g** modifier kill the capturing parentheses? I was wondering because I can imagine a scenario where you might have multiple targets in a string. `str = 'Something more things " ...'` If I wanted all id values in an array, it seems you would want to use **g** but then the capturing parentheses would be gone. What would the best way be here? `reg = /id="([^"]+)"/g;` -> ["id="12345"","id="qwerty""] -> then run a foreach with `/id="([^"]+)"/` ? Can it be done in one regex step? – jmk2142 Jun 19 '13 at 15:55
  • 2
    @orangewarp: that's the standard behaviour for match with a global flag. It basically repeats a call to `exec` behind the scenes, returning only the first element from the result array in each iteration. The best solution is probably to run your own loop, calling `RegExp.prototype.exec` and parsing the result yourself, or use the [search and don't replace](http://ejohn.org/blog/search-and-dont-replace/) method (both of which are more or less the same, but the latter offers a little more convenience in some cases). – Andy E Jun 20 '13 at 07:45
  • 1
    @AndyE Good to know the behind the scenes. Follow-Up: Great link. Helped simplify my code. I've included a sample for other people to learn. Thanks once again! **[JS Fiddle Example](http://jsfiddle.net/TvNA3/)** – jmk2142 Jun 20 '13 at 17:17
  • Only thing different to a lookbehind is, that `Text:"` is consumed by the match. If I want to replace with `"…"` p.e. I have to do `str.replace(/(Text:")([^"]+)/, "$1…")`instead of `str.replace(/Text:"([^"]+)/, "…")`. – v01pe Jun 24 '13 at 00:09
  • How about the case when the lookahead string is a multichar string i.e. instead of ", it is ":Text e.g. - i/p - Text:"How secure is my information?":Text o/p - How secure is my information? – user1412066 Oct 23 '16 at 14:29
  • ufff! wasted quite some time fiddling with _firefox 68.0.1_ :-( thanks – Anand Rockzz Aug 05 '19 at 23:22
  • In 2020, [lookbehind expressions are still not supported in all browsers](https://caniuse.com/#search=lookbehind), which means that they should be avoided unless you can guarantee the use of a browser that supports them. – Dave F Mar 26 '20 at 12:52
27

I just want to add something: JavaScript doesn't support lookbehinds like (?<= ) or (?<! ).

But it does support lookaheads like (?= ) or (?! ).

Eddy
  • 3,623
  • 37
  • 44
13

You can just do:

/Text:"(.*?)"/

Explanation:

  • Text:" : To be matched literally
  • .*? : To match anything in non-greedy way
  • () : To capture the match
  • " : To match a literal "
  • / / : delimiters
codaddict
  • 445,704
  • 82
  • 492
  • 529
2

If you want to avoid the regular expression all together you can do:

var texts = file.split('Text:"').slice(1).map(function (text) {
  return text.slice(0, text.lastIndexOf('"')); 
});
Bill Criswell
  • 32,161
  • 7
  • 75
  • 66
2
string.match(/Text:"([^"]*)"/g)
CrayonViolent
  • 32,111
  • 5
  • 56
  • 79
  • 1
    How to avoid matching Text: in the result? – Raj Aug 25 '10 at 19:30
  • 1
    string[0] will always have the full regex match. string[1] will have the captured text. If there was a 2nd capture (parenthesis) in the regex, it would be put in string[2], etc... – CrayonViolent Aug 25 '10 at 19:59
  • 1
    I think with /g flag you will only get full regex match for all matches. – Raj Aug 25 '10 at 21:16
  • 1
    The g modifier will look for everything on the page that matches, not just stop at the first match. You said you have strings (plural), so that's why I put the /g modifier there. – CrayonViolent Aug 25 '10 at 22:54
2
<script type="text/javascript">
var str = 'Text:"How secure is my information?"';
var obj = eval('({'+str+'})')
console.log(obj.Text);
</script>
Sjoerd
  • 74,049
  • 16
  • 131
  • 175
  • 5
    Modern browsers also have [JSON.parse](https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/JSON/parse), which may be preferred over `eval`. – Sjoerd Jun 20 '12 at 12:39
1

Here is an example showing how you can approach this.

1) Given this input string:

const inputText = 
`Text:"How secure is my information?"someRandomTextHere
Voice:"Not very much"
Text:"How to improve this?"
Voice:"Don't use '123456' for your password"
Text:"OK just like in the "Hackers" movie."`;

2) Extract data in double quotes after the literal Text: so that the results is an array with all matches like so:

["How secure is my information?",
 "How to improve this?",
 "OK just like in the \"Hackers\" movie."]

SOLUTION

function getText(text) {
  return text
    .match(/Text:".*"/g)
    .map(item => item.match(/^Text:"(.*)"/)[1]);
}

console.log(JSON.stringify(    getText(inputText)    ));

RUN SNIPPET TO SEE A WORKING DEMO

const inputText = 
`Text:"How secure is my information?"someRandomTextHere
Voice:"Not very much"
Text:"How to improve this?"
Voice:"Don't use '123456' for your password"
Text:"OK just like in the "Hackers" movie."`;



function getText(text) {
  return text
    .match(/Text:".*"/g)
    .map(item => item.match(/^Text:"(.*)"/)[1]);
}

console.log(JSON.stringify(    getText(inputText)    ));
Piotr Berebecki
  • 7,428
  • 4
  • 33
  • 42
1

If you, like me, get here while researching a bug related to the Cloudinary gem, you may find this useful:

Cloudinary recently released version 1.16.0 of their gem. In Safari, this crashes with the error 'Invalid regular expression: invalid group specifier name'.

A bug report has been filed. In the meantime I reverted to 1.15.0 and the error went away.

Hope this saves someone some lifetime.

de.
  • 7,068
  • 3
  • 40
  • 69
0

A regular expression with lookbehind

regex = /(?<=.*?:).*/g

can be used to produce an array with all matches found in the inputText (from Piotr Berebecki's answer):

> inputText.match(regex)
[
  '"How secure is my information?"someRandomTextHere',
  '"Not very much"',
  '"How to improve this?"',
  `"Don't use '123456' for your password"`,
  '"OK just like in the "Hackers" movie."'
]

Each match consists of the quoted string following the first colon in a line.

In the absence of lookbehinds, a regular expression with groups can be used:

regex = /(.*?:)(.*)/g

With this, each match consists of a complete line, with two groups: the first containing the part up to the colon and the second containing the rest.

> inputText.match(regex)
[
  'Text:"How secure is my information?"someRandomTextHere',
  'Voice:"Not very much"',
  'Text:"How to improve this?"',
  `Voice:"Don't use '123456' for your password"`,
  'Text:"OK just like in the "Hackers" movie."'
]

To see the groups, you must use the .exec method. The first match looks so:

> [...regex.exec(inputText)]
[
  'Text:"How secure is my information?"someRandomTextHere',
  'Text:',
  '"How secure is my information?"someRandomTextHere'
]

To loop over all matches and process only the second group of each (that is, the part after the colon from each line), use something like:

> for (var m, regex = /(.*?:)(.*)/g; m = regex.exec(inputText); ) console.log(m[2]);
"How secure is my information?"someRandomTextHere
"Not very much"
"How to improve this?"
"Don't use '123456' for your password"
"OK just like in the "Hackers" movie."
Heiko Theißen
  • 12,807
  • 2
  • 7
  • 31