0

I have an XML that I need to search for some node combinations. In specific, I need to find posts where node2 is empty and node4 is empty too. So suppose the following XML:

<data>
  <post>
    <node1>111</node1>
    <node2>222</node1>
    <node3>333</node1>
    <node4>444</node1>
    <node5>555</node1>
  </post>
  <post>
    <node1>111</node1>
    <node2/>
    <node3>333</node1>
    <node4>444</node1>
    <node5>555</node1>
  </post>
  <post>
    <node1>111</node1>
    <node2>222</node1>
    <node3>333</node1>
    <node4/>
    <node5>555</node1>
  </post>
  <post>
    <node1>111</node1>
    <node2/>
    <node3>333</node1>
    <node4/>
    <node5>555</node1>
  </post>
  <post>
    <node1>111</node1>
    <node2>222</node1>
    <node3>333</node1>
    <node4>444</node1>
    <node5>555</node1>
  </post>
</data>

So I tried this pattern <node2/>[\n\s\S]+?<node4/>, but it catches also the cases where node2 is empty up until it finds an empty node4, even if it spans across many posts. I need to do some complex lookahead/lookbehind to confine it into the boundaries of a single post. The correct result would be to get only the 4th post.

enter image description here

Can someone help me with this?

For the genius that closed the topic: This has nothing to do with XML parsing... I need to find those cases in an XML to investigate the returned data in order to determine how to deal with them...

Faye D.
  • 833
  • 1
  • 3
  • 16
  • Just use **LINQ to XML** API for your task. It is available in the .Net Framework since 2007. – Yitzhak Khabinsky Feb 26 '23 at 02:37
  • 2
    If you *must* use regex, you'll need a [tempered greedy token](http://www.rexegg.com/regex-quantifiers.html#tempered_greed) e.g. something like https://regex101.com/r/tHATAx/1 – Nick Feb 26 '23 at 02:38
  • @Nick for some reason it doesn't work in VS.Code – Faye D. Feb 26 '23 at 02:42
  • What language are you using? – Nick Feb 26 '23 at 02:55
  • 2
    For vscode you need to explicitly include `\n` in the multiline bit (i.e., `\s` doesn't include `\n`), so @Nick's regex works like this: `(?:(?!` – Mark Feb 26 '23 at 03:43
  • @Mark interesting... I did not know that. But then you'd think that in that case `\S` would include `\n`, because `[\s\S]` should in theory contain every character. – Nick Feb 26 '23 at 05:04
  • @Nick or @Mark I know this isn't what my original question was about, but could you please help me find the correct pattern in the following example regex101.com/r/HVdE5Y/1, so that it grabs only the second post (which has the `ddddd` term)? Similarly to this question's behavior, my pattern sucks the first post as well... – Faye D. Feb 27 '23 at 04:28
  • 1
    Perhaps https://regex101.com/r/HVdE5Y/2 – Nick Feb 27 '23 at 04:34

0 Answers0