7

I am a bit clueless about the next task. I wish to select a text between " that its inside a tag but not outside of the tag,i.e. a selection inside another selection.

I have the next tag: <| and |> and i want to select a text only if its between the " and between the tags.

<| blah blah blah "should be selected" not selected "select it too" |> "not selected too"

I think something about

(\<\|)(\").*?(\")(\|\>)   

But it doesn't work.

magallanes
  • 6,583
  • 4
  • 54
  • 55

4 Answers4

6

I've got it to match correctly using two regexes.

var input = '<|a "b"|>c "d"ef<|"g"h "i"|>"j"k l';
var output=input.match(/<\|(.*?)\|>/g)
   .map(function(x){return x.match(/"(.*?)"/g)})
alert(output)

As you can see, correctly matches "b","g","i".

The principle:

  1. find all the matches of text between <| and |>
  2. for every match from the first step, find matches of text between two quotes.

(used the regex from the second answer from the linked question)

Community
  • 1
  • 1
nicael
  • 18,550
  • 13
  • 57
  • 90
4

This will do the job in a single regex:

(?<=<\|[^>]*)"[^"]*"

In addition to a comment of nicael: It might be possible that the input string is not tagged correctly. This will help:

(?<=<\|((?!\|>).)*)"[^"]*"

If you need to use it with JavaScript:

(?=("[^"]*"[^"]*)*$)"[^"]*"(?=((?!<\|).)*\|>)

Sebastian Schumann
  • 3,204
  • 19
  • 37
  • Looks great, but why doesnt it work [there](http://regexr.com/3cet3)? On mobile, cant get the error to be displayed. – nicael Dec 20 '15 at 20:47
  • Found a problem in your regex: [this](http://regexstorm.net/tester?p=(%3f%3c%3d%3c%5c%7c%5b%5e%3e%5d*)%22%5b%5e%22%5d*%22&i=%3c%7c+blah+blah+blah+%22should+be+selected%22+not+selected+%22select+it+too%22+%3e+%22not+selected+too%22+%3c%7c+%22test%22+%7c%3e+%22wrong+match%22+) doesn't match correctly. I replaced first |> with > , this should cause "not selected too" to match, though it doesn't. – nicael Dec 20 '15 at 20:51
  • @nicael I added a regex that solves the second problem. My regex will work with .Net regex. Your fist sample shows that look behind assertions are not allowed in JavaScript. Sorry. What's that? There is a regex that doesn't support look behind? I try to find a solution that works with look ahead assertions. – Sebastian Schumann Dec 21 '15 at 21:02
  • @nicael Will [`(?=("[^"]*"[^"]*)*$)"[^"]*"(?=((?!<\|).)*\|>)`](http://regexr.com/3cf39) be a possible solution? It looks strange if you remove the `|` of the first closing token but maybe it's okay. – Sebastian Schumann Dec 21 '15 at 21:24
  • This one seems to be perfect! – nicael Dec 21 '15 at 21:26
  • @nicael The regex in my edited answer solves the problem you posted in the second comment. – Sebastian Schumann Dec 21 '15 at 21:28
3

I can't think of a regular expression to match what you want in one shot but I don't see the reason not to do it with two regexps:

var SAMPLE_STRING = '<| blah blah blah "should be selected" not selected "select it too" |> "not selected too" <| "select it" do not select this |> "don\'t select this one too"';

var matchAll = function matchAll(regexp, str) {
  var lastIndex = regexp.lastIndex;
  regexp.lastIndex = 0;
  var result = [];
  var match;
  while ((match = regexp.exec(str)) !== null) {
    result.push(match[0]);
  }
  regexp.lastIndex = lastIndex; // so this method won't have any side effects on the passed regexp object
  return result;
};

var withinTagsRegexp = /<\|([^|]|\|[^>])+\|>/g;
var withinQuotesRegexp = /"[^"]+"/g;

var withinTagsAndQuotes = [].concat.apply([], // flattens the following
    matchAll(withinTagsRegexp, SAMPLE_STRING).map(
    matchAll.bind(undefined, withinQuotesRegexp)));

// show the result

var resultTag = document.getElementById('result');

withinTagsAndQuotes.forEach(function(entry) {
  var p = document.createElement('p');
  p.innerHTML = entry;
  resultTag.appendChild(p);
});
<div id="result"></div>
fardjad
  • 20,031
  • 6
  • 53
  • 68
2

Try it with look-behinds and look-aheads:

(?<=\<\|.)(\"[^"]*\")(?=.\|\>)

Regular expression visualization

Here's a live demo.

Jan Eglinger
  • 3,995
  • 1
  • 25
  • 51
  • @fardjad oh, you're right. The answer by nicael fails at your example, too. Do you have better suggestions? – Jan Eglinger Dec 20 '15 at 15:54
  • Yes, but not with a single regex, I'll post my solution in a minute. – fardjad Dec 20 '15 at 16:12
  • 1
    That's odd... I would expect this to match only the `"test"` part of fardjad's example (instead of matching too often), because you only allow one character between the `|` and opening `"`, and only one character between the closing `"` and `|`. http://pythex.org/ seems to agree... is that Debuggex bug, a Pythex bug, or something else entirely? – MvanGeest Dec 20 '15 at 18:15