Below is the code I'm currently working with.
The input XML file is available here: http://pastebin.com/hcQhPSjs
header("Content-Type: text/plain");
$xmlFile = new domdocument();
$xmlFile->preserveWhiteSpace = false;
$xmlFile->load("file:///srv/http/nginx/html/xml/UNSD_Quest_Sample.xml");
$xpath = new domxpath($xmlFile);
$hier = '//Workbook';
$result = $xpath->query($hier);
foreach ($result as $element) {
print $element->nodeValue;
print "\n";
};
Now for the $hier
variable, PHP won't parse the results unless I use the wildcard *
to reach the nodes I need. So instead of using the usual /Workbook/Worksheet/Table/Row/Cell/Data
method of accessing nodes, I'm relegated to /*/*[6]/*[2]/*
The input file is an excel spreadsheet exported to xml. Seems like the issue might be in the export from xls to xml.
What I find peculiar is the fact that Firefox (default browser) does not parse the namespace attributes for the root element <Workbook>
while Chromium and/or any text editor do.
Firefox:
<?mso-application progid="Excel.Sheet"?>
<Workbook>
<DocumentProperties>
<Author>Htike Htike Kyaw Soe</Author>
<Created>2014-01-14T20:37:41Z</Created>
<LastSaved>2014-12-04T10:05:11Z</LastSaved>
<Version>14.00</Version>
</DocumentProperties>
<OfficeDocumentSettings>
<AllowPNG/>
</OfficeDocumentSettings>
Chromium:
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>Htike Htike Kyaw Soe</Author>
<Created>2014-01-14T20:37:41Z</Created>
<LastSaved>2014-12-04T10:05:11Z</LastSaved>
<Version>14.00</Version>
</DocumentProperties>
<OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
<AllowPNG/>
</OfficeDocumentSettings>
Could anyone explain why this is the case?