0

I am trying to use Xpath to search for nodes within an html block.

I have found the HTML can have random XML nodes. eg <john></john> or <john@gmail.com></john@gmail.com>

How do I structure the xpath query to find all instances in a single search.

I've tried the following but not having luck.

 //john@gmail.com|//anotheremail@gmail.com|//john|//anotheremail
 //john@gmail.com|anotheremail@gmail.com|john|anotheremail
 //john@gmail.com or anotheremail@gmail.com or john or anotheremail

But it doesn't produce the result set.

If I search for them individually, I can get matches.

What am I doing wrong here?

Solvision
  • 193
  • 1
  • 11
  • 1
    Isn't it //(john@gmail.com|another@gmail.com)? Just guessing. I'm on my phone and can't test right now – patrick Sep 07 '17 at 23:52
  • Since you've tagged this with [tag:php], how are you parsing this? `DOMDocument` does **not** like those node names – Phil Sep 08 '17 at 00:01
  • 1
    `` is not a well-formed start tag in XML -- element names cannot have `@` in them -- so don't expect XPath or any other XML tools to help here until you fix that. – kjhughes Sep 08 '17 at 00:26
  • Unfortunately some mail client that sent this email (which is the HTML), inserted these tags to mark the start of the history/quote. So I can't fix it, I need to be able to handle it which is what I'm triyng now – Solvision Sep 08 '17 at 00:35
  • Yes using PHP and DomDocument. If it wont like/find ndoes with @ in the name, is there are way to search for that some other way? but still using an xpath query – Solvision Sep 08 '17 at 00:37
  • If you can't fix it at the source, pre-process it **as a text file.** If you can't repair it as a text file to be well-formed XML, then you can't use XML libraries. To be conformant, an XML library ***necessarily*** has to reject tags such as `john@gmail.com`. – kjhughes Sep 08 '17 at 01:35
  • 1
    Possible duplicate of [How to parse invalid (bad / not well-formed) XML?](https://stackoverflow.com/questions/44765194/how-to-parse-invalid-bad-not-well-formed-xml) – kjhughes Sep 08 '17 at 01:38
  • 1
    @kjhughes “rule breaking is rarely bound by rules” what a great phrase. – ishegg Sep 08 '17 at 02:16

1 Answers1

0

The wording of your questions sounds kinda vague, but I suppose you'd like to get all nodes in your html block? Isn't that as simple as an xpath of //*?

Simon Baars
  • 1,877
  • 21
  • 38