1

I am trying to open up a Word 2007 document (docx), I unzip it successively but I am having an issue with the xPath portion of the code. I want to iterate each element and grab the text within the element.

In the current example below I am trying to get the first element's text to get used to the xPath system.

document.xml

<w:document>
    <w:body>
        <w:p>
            <w:r>
                <w:t>Testing</w:t>
            </w:r>
        </w:p>
    </w:body>
</w:document>

PHP

$dom = new DOMDocument();
$dom->loadXML($string);
$xpath = new DomXPath($dom);
$textNodes = $xpath->query("/w:document/w:body/w:p[1]/w:r[1]/w:t[1]");
var_dump($textNodes->item(1)->textContent);
Kara
  • 6,115
  • 16
  • 50
  • 57
Anderson
  • 101
  • 4
  • 12
  • If you set a variable and then check in the next line if it is set = not necessary. Also consider to separate the code from extracting the zipfile from doing the XML parsing. This must not be all in one place. Also your question would make more sense for more users who do different with unzipping here. – hakre May 03 '13 at 16:57
  • Also before asking about a general problem (like long description of a story what one does and what not etc. yadda yadda yadda), check for errror messages first. Give concrete information what is going wrong. I leave you an answer to show this. – hakre May 03 '13 at 17:00
  • 1
    Consider giving https://github.com/PHPOffice/PHPWord a try. It might make things easier since it's specifically targeted at Word. – Gordon May 03 '13 at 17:29
  • If you have managed to get the error messages enabled you might still be puzzled a bit. Take a look at the description as well here: http://php.net/domnodelist.item – hakre May 04 '13 at 04:39

2 Answers2

2

So I assume that the missing namespace is only because of the shorten example xml. The original document will provide the namespace. If this is true the xpath query will work. The problem here is, that query is a DOMNodeList. var_dump seens not to work for that. You can use something like:

$textNodes = $xpath->query("/w:document/w:body/w:p[1]/w:r[1]/w:t[1]");
foreach ($textNodes as $entry) {
echo "node: {$entry->nodeName}," .
     "value: {$entry->nodeValue}\n";
}

Which generate this output (after adding a namespace to your input xml):

   node: w:t,value: Testing
hr_117
  • 9,589
  • 1
  • 18
  • 23
  • **So I assume that the missing namespace is only because of the shorten example xml.** You are correct. – Anderson May 05 '13 at 11:54
0

You have got an invalid xpath query which needs to be fixed because an invalid xpath query will always result in an error. You can not use the outcome of it to get nodes out of it.

Unfortunately the xpath query is invalid because the XML is invalid. So you can not use the xpath query (or further test it / continue to write it) without fixing the XML first.

From the XML you've provided in your question it's apparently missing the namespace declaration of the w-prefix.

You need to enable error reporting to the highest level (E_ALL), display of errors in your development environment and generally the error logging. You then can follow the error log:

Warning: DOMDocument::loadXML(): Namespace prefix w on document is not defined in Entity, line: 1 in /tmp/execpad-1d8a88cab4fd/source-1d8a88cab4fd on line 15

Warning: DOMDocument::loadXML(): Namespace prefix w on body is not defined in Entity, line: 2 in /tmp/execpad-1d8a88cab4fd/source-1d8a88cab4fd on line 15

Warning: DOMDocument::loadXML(): Namespace prefix w on p is not defined in Entity, line: 3 in /tmp/execpad-1d8a88cab4fd/source-1d8a88cab4fd on line 15

Warning: DOMDocument::loadXML(): Namespace prefix w on r is not defined in Entity, line: 4 in /tmp/execpad-1d8a88cab4fd/source-1d8a88cab4fd on line 15

Warning: DOMDocument::loadXML(): Namespace prefix w on t is not defined in Entity, line: 5 in /tmp/execpad-1d8a88cab4fd/source-1d8a88cab4fd on line 15

Warning: DOMXPath::query(): Undefined namespace prefix in /tmp/execpad-1d8a88cab4fd/source-1d8a88cab4fd on line 17

Warning: DOMXPath::query(): Invalid expression in /tmp/execpad-1d8a88cab4fd/source-1d8a88cab4fd on line 17

Fatal error: Call to a member function item() on a non-object in /tmp/execpad-1d8a88cab4fd/source-1d8a88cab4fd on line 18

As these show, there are many problems with the XML which in the end renders the xpath query invalid and finally bring your whole script to halt.

hakre
  • 193,403
  • 52
  • 435
  • 836