0

Hi I have an xml text like this

<w:p> abc </w:p>
<w:p> def </w:p>
<w:tr #A1b2c3> <w:p> 123 </w:p> </w:tr>
<w:tr #C1d2e3> <w:p> 456 </w:p> </w:tr>
<w:p> ghi </w:p>

I need to extract all paragraphs like abc except those inside a table row like 123 . Any help please?

SAliaMunch
  • 67
  • 9
  • 7
    use a XMLParser instead – Code Maniac Sep 24 '19 at 12:00
  • 1
    `xpath` or `XElement` should give you access to the parent node, so you can check from there if the node is within a `w:tr` node. – npinti Sep 24 '19 at 12:01
  • I have to do it with regex, it is an obligation – SAliaMunch Sep 24 '19 at 12:03
  • 2
    @SAliaMunch - It doesn't look like XML - what's the `#A1b2c3` stuff? – Enigmativity Sep 24 '19 at 12:04
  • @SAliaMunch - Choosing to use Regex for parsing XML might end up like using a wood saw for performing open heart surgery. – Enigmativity Sep 24 '19 at 12:06
  • [this should do the trick](https://regex101.com/r/bLpOS1/2) – Innat3 Sep 24 '19 at 12:14
  • If "Use an XML Parser" is not an detail enought. [XmlDocument.Load](https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldocument.load?view=netframework-4.8) and [XmlNode.SelectNodes](https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmlnode.selectnodes?view=netframework-4.8) give you `var xdoc = new XmlDocument(); xdoc.Load(something); var list = xdoc.SelectNodes("//p");` 3 lines end of the story. [xpath](https://learn.microsoft.com/fr-fr/previous-versions/ms256086(v=vs.120)?redirectedfrom=MSDN) – xdtTransform Sep 24 '19 at 12:17
  • `((.*)<\/w:p>)(?!.*<\/w:tr>)` will try to match for all ` ... ` elements except those with `` found on the same line. Still wont work if your `` elements are on separate lines – ArcX Sep 24 '19 at 12:50
  • Thank you @Innat3 the regex you write works fine for me! – SAliaMunch Sep 24 '19 at 13:31
  • @SAliaMunch - What is this format? It's not XML. – Enigmativity Sep 24 '19 at 21:37
  • @ Enigmativity the tags contain a lot of details which are not important to the question so I replaced it with very short random text.. to put the focus on what I want. – SAliaMunch Sep 25 '19 at 08:18

1 Answers1

0

That doesn't look like valid XML so regex could be your only option:

^<w:p>(.*?)<\/w:p>$

https://regex101.com/r/QsS3tW/1


You may wish to figure out if there is an existing parser for that data because apparently some system must be using the data since it exists.

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77