-1

I'm trying to find the color of a span in a link set in the CSS of the following HTML example using DOMDocument/xPath:

   <html>
      <head>
          <style>
             a span{
                color: #21d;
             }
          </style>
      </head>
      <body>
          <a href='test.html'>this is a <span>test</span></a>
      </body>
   </html>

I can find all CSS with the xPath '//style' ($css = $path->query( '//span' )->nodeValue) and then do something with a pregmatch to get the result, but wonder if there is a way to get this color using xPath, and if so, what that way is.

patrick
  • 11,519
  • 8
  • 71
  • 80
  • 1
    Let's see some code to show us what you've tried. XPath is used to traverse the DOM, it's not a CSS parser. – miken32 Oct 15 '18 at 20:21
  • @miken32, I get the CSS with $css = $path->query( '//span' )->nodeValue, like in the question... – patrick Oct 15 '18 at 20:24
  • 1
    @miken32, it's not a duplicate of that post... That post is asking how to make a regexp, this question is about solving it using xpath - if that's even possible – patrick Oct 15 '18 at 20:28
  • 1
    It's not possible, as I said. – miken32 Oct 15 '18 at 20:29
  • 1
    You can do it with selenium and using xpath, however like @miken32 said, not with a DomDocument parser or any other php library that uses libxml. They are used for raw parsing of xml – Cemal Oct 15 '18 at 20:51

1 Answers1

1

XPath is not particularly well adapted to this kind of task, but contrary to what's been put forth in the comments it is possible using evaluate() and some nested string functions like substring-before() and substring-after():

$html = '
    <html>
      <head>
          <style>
             a span{
                background-color: #ddd;
                color: #21d;
             }
          </style>
      </head>
      <body>
          <a href="test.html">this is a <span>test</span></a>
      </body>
   </html>
';

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DomXPath($dom);

$result = $xpath->evaluate("
    substring-before(
        substring-after(
            substring-after(
                normalize-space(//style/text())
            , 'a span')
        ,' color:')
    ,';')
");
echo $result;

OUTPUT:

#21d

Working from the inside out:

  1. Normalize whitespace.
  2. Get the part of the style text after your selector.
  3. Get the text after the css rule in question. Notice I added a space before ' color:' to avoid possibly getting background-color or the like. Normalizing the space in step one makes this work even if color: was preceded by a tab.
  4. get the string before the final ; of the color rule.

I'm pretty sure there are a slew of potential points of failure here and I wouldn't recommend using XPath for something like this but it's an interesting exercise all the same.

  • 1
    wonderful way of thinking outside the box. Something some other people don't seem to be able to do (the, no, not possible, downvote and leave kind)... Thanks for this answer and especially the 'wouldn't recommend"-part. I solved the problem parsing the CSS using a regexp like my first idea was. I can indeed see this solution going wrong if there's another CSS with 'a span' in the ID... Nevertheless thanks for pointing out it can be done – patrick Oct 16 '18 at 09:54