4

Been scratching my head at this one for way too long now...

$dom = new DOMDocument();
$dom->loadHTML( $content );

$links = $dom->getElementsByTagName( 'a' )->item( 0 );
foreach ( $links->attributes as $attribute ) {
    $name = $attribute->nodeName;
    $value = str_replace( '"', '', stripslashes( $attribute->nodeValue ) );
    echo "$name: $value<br />";
}

There is my code which I eventually got from: php dom get all attributes of a node. I've also tried other methods such as calling getAttribute() for a single attribute to see if that would work, but got the same result.

The HTML I am attempting to go through is simply:

<a id="testid" title="testtitle" name="this is a testname" href="http://example.com/">link!</a>

I'm getting the following error:

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: error parsing attribute name in Entity, line: 1

My script is outputting:

id: testid
title: testtitle
name: this
is: 
a: 
testname: 
href: http://example.com/

I should add that the output works fine if the 'name' attribute is one word.

So obviously, it must be using explode() or something stupid on spaces. Is there a way to get around this without converting all spaces to %20 or something (I have plenty of other content beyond the links and wouldn't want to convert a whole block of content)?

Community
  • 1
  • 1
Andrew Ryno
  • 659
  • 1
  • 7
  • 17
  • Works for me as is. What PHP version are you using? – Álvaro González Apr 13 '11 at 07:10
  • [DOM doesnt do anything like that](http://codepad.viper-7.com/YqiUVN "Runs code above in Viper Codepad"). Please provide a code snippet that reproduces the issue. – Gordon Apr 13 '11 at 07:13
  • Using PHP 5.3.2. I think the problem is WordPress, since I'm actually doing this in a plugin. I tried both using the post content as a string and then the value WP supplies, and it worked with my own string, but not the content WP passes. Must be something in that. I might just switch to a custom shortcode. Much easier. – Andrew Ryno Apr 13 '11 at 07:31
  • 1
    You can't use spaces in the `name` attribute. – drudge Apr 13 '11 at 07:34

1 Answers1

5

As noted in the comments, the name attribute shares the same space as the id attribute, which is defined as a "NAME token", which are restricted to letters, numbers, dashes, underscores, periods and colons.

You'll note there are no spaces permitted in that list.

Some versions of the DOMDocument parser that PHP uses are super-strict about HTML compliance, and will whine and regularly do wrong things when confronted with spec violations. This may be one of those cases. Remove the spaces from your name attribute and see if you continue to see the problem.

Charles
  • 50,943
  • 13
  • 104
  • 142
  • it should be noted that the W3C's own validator does not complain about spaces in the name attribute and the linked specs do not clearly say if "letters" includes spaces, but just that CDATA is a sequence of characters from the "document character set". – Gordon Apr 13 '11 at 09:47