I have an XML file that would look something like this:
<Table>
<Persons>
<Person>
<ID>71</ID>
<FullNameLikeX>"sentence expected"</FullNameLikeX>
<Age>49</Age>
<FavoriteFood>Banana</FavoriteFood>
<NameParts>
<word>Jhon</word>
<word>Henry</word>
<word>Abbot</word>
</NameParts>
</Person>
<Person>
<ID>72</ID>
<FullNameLikeX>"sentence expected"</FullNameLikeX>
<Age>26</Age>
<FavoriteFood>Cake</FavoriteFood>
<NameParts>
<word>Cecilia</word>
<word>Elisabeth</word>
<word>Maria</word>
<word>Smith</word>
</NameParts>
</Person>
<Person>
<ID>73</ID>
<FullNameLikeX>"sentence expected"</FullNameLikeX>
<Age>17</Age>
<FavoriteFood>Lasagna</FavoriteFood>
<NameParts>
<word>Luc</word>
<word>Hernandez</word>
</NameParts>
</Person>
</Persons>
</Table>
And i was trying to replace the "sentence expected" part by the actual sentence(For the first person here that would give: "Jhon Henry Abbot like Banana") using Regular expression in a text editor(Notepad++). My problem is I can't find a way to deal with the varying amount of "word" tag within the "NameParts" tag without a group ending up overreaching into the next "Person" tag or the group being empty.
Came up with this Regular Expression:
(<FullNameLikeX>")[\s\S]*?("<\/FullNameLikeX>)([\s\S]*?<FavoriteFood>([\s\S]*?)<\/FavoriteFood>[\s\S]*?<NameParts>###[\s\S]*?<\/NameParts>)
Instead of ### i already tried placing multiple(from 1 to 4) of:
(?:[\s\S]*?<word>([\s\S]*?)<\/word>)?
but group end-up reaching into the next Person when there are less word than this group count.
(?:[\s\S]*?<word>([\s\S]*?)<\/word>)??
it doesn't reach into next person but no group are being looked for.
(?:[\s\S]*?<word>([\s\S]*?)<\/word>)+?
group end-up reaching into the next Person when there are less word than this group count.
(?:[\s\S]*?<word>([\s\S]*?)<\/word>(?![\s\S]*?<\/Person>[\s\S]*?))?
it doesn't reach into next person but capture group are somehow empty.
So basically some group always either try to get 1 iteration even when they should not and end-up over-reaching into the next Person tag or they get 0 iteration when they should get 1.
Is there a way to capture an varying amount of XML Tag value without reaching into another Tag with just regular expression or it is just not possible ?
ps: This XML file is just a look a-like, the actual file is way longer and tag name and value are obscured, i replaced the tag name and value by simple one for clarity of reading but the format of the file stay the same.(Also it doesn't seem to have less than 1 "word" tag and no more than 5 per "NameParts" tag if it can actually help).