1

I don't understand how to get in the second match <sub>aaaa</sub> and not <sub>eeee</sub>

my regex:

<item>.*?<sub>(.*?)<\/sub>.*?<value>(.*?)<\/value>.*?<\/item>

content:

<item> fffffffffffff
<sub>aaaa</sub>
<value>111</value>
</item>

<item>
<sub>eeee</sub> arg34ddddddddddddddd
<atag>ddd</atag>
<sub>aaaa</sub>
<atag>dddg</atag>
<value>222</value>
</item>

Can I get it in a step or do I need running a regex several times?

UPDATE

I want to get the result like this:

[ [ 'aaaa', 111],['aaaa', 222] ]

Is it possible?

TigerTV.ru
  • 1,058
  • 2
  • 16
  • 34

1 Answers1

-1

Try

<item>[\s\S]*?<sub>(.*?)<\/sub>((?!<sub>)[\s\S])*<\/item>

Demo

This takes only the last sub you have between items.

Explanation:

  • <item>[\s\S]*?<sub> matches lazily anything between item and sub tags
  • <sub>(.*?)<\/sub> matches sub tag and captures its content
  • ((?!<sub>)[\s\S])*<\/item> uses Tempered Greedy Token to assure that after the sub that was matched before, there is no more sub tags before the closing item tag
mrzasa
  • 22,895
  • 11
  • 56
  • 94
  • What if first `aaaa` doesn't have a following ``? – revo Mar 03 '18 at 22:11
  • You can make that part optional: `[\S\s]*?(.*?)<\/sub>[\s\S]*?((.*?)<\/value>[\S\s]*?)?<\/item>` [Demo](https://regex101.com/r/CdEuTl/3/) but is it the case in OP problem? If we need to deal with sth more complex (more optional values), I'd better go with XML parser instead. – mrzasa Mar 03 '18 at 22:14
  • That's the point. Requirements are not clear either and in your demo `eeee` is matched which shouldn't. – revo Mar 03 '18 at 22:16