Complex regex pattern in VS.Code with lookaheaed/lookbehind

Question

I have an XML that I need to search for some node combinations. In specific, I need to find posts where node2 is empty and node4 is empty too. So suppose the following XML:

<data>
  <post>
    <node1>111</node1>
    <node2>222</node1>
    <node3>333</node1>
    <node4>444</node1>
    <node5>555</node1>
  </post>
  <post>
    <node1>111</node1>
    <node2/>
    <node3>333</node1>
    <node4>444</node1>
    <node5>555</node1>
  </post>
  <post>
    <node1>111</node1>
    <node2>222</node1>
    <node3>333</node1>
    <node4/>
    <node5>555</node1>
  </post>
  <post>
    <node1>111</node1>
    <node2/>
    <node3>333</node1>
    <node4/>
    <node5>555</node1>
  </post>
  <post>
    <node1>111</node1>
    <node2>222</node1>
    <node3>333</node1>
    <node4>444</node1>
    <node5>555</node1>
  </post>
</data>

So I tried this pattern <node2/>[\n\s\S]+?<node4/>, but it catches also the cases where node2 is empty up until it finds an empty node4, even if it spans across many posts. I need to do some complex lookahead/lookbehind to confine it into the boundaries of a single post. The correct result would be to get only the 4th post.

Can someone help me with this?

For the genius that closed the topic: This has nothing to do with XML parsing... I need to find those cases in an XML to investigate the returned data in order to determine how to deal with them...

Just use **LINQ to XML** API for your task. It is available in the .Net Framework since 2007. — Yitzhak Khabinsky, Feb 26 '23 at 02:37
If you *must* use regex, you'll need a [tempered greedy token](http://www.rexegg.com/regex-quantifiers.html#tempered_greed) e.g. something like https://regex101.com/r/tHATAx/1 — Nick, Feb 26 '23 at 02:38
For vscode you need to explicitly include `\n` in the multiline bit (i.e., `\s` doesn't include `\n`), so @Nick's regex works like this: `(?:(?!` — Mark, Feb 26 '23 at 03:43
@Mark interesting... I did not know that. But then you'd think that in that case `\S` would include `\n`, because `[\s\S]` should in theory contain every character. — Nick, Feb 26 '23 at 05:04
@Nick or @Mark I know this isn't what my original question was about, but could you please help me find the correct pattern in the following example regex101.com/r/HVdE5Y/1, so that it grabs only the second post (which has the `ddddd` term)? Similarly to this question's behavior, my pattern sucks the first post as well... — Faye D., Feb 27 '23 at 04:28

Complex regex pattern in VS.Code with lookaheaed/lookbehind

0 Answers0