0

I've successfully saved down this XML file to my server using the following PHP:

file_put_contents("test.xml", fopen("http://www.treasury.gov/resource-center/data-chart-center/interest-rates/pages/XmlView.aspx?data=yieldyear&year=2015", 'r'));

Now i'm trying to get it into my DB. Just trying to get it to work by executing SQL in the phpMyAdmin GUI. I successfully setup the following table:

CREATE TABLE `test` (
  `NEW_DATE` varchar(40) NOT NULL,
  `BC_1MONTH` int(11) NULL,
  `BC_3MONTH` int(11) NULL,
  PRIMARY KEY (`NEW_DATE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

The following code runs without error, but just results in the addition of one empty row to my DB.

LOAD XML LOCAL INFILE 'test.xml'
INTO TABLE test
ROWS IDENTIFIED BY '<content>';

I see here that MySQL server looks for field names matching the column names of the target table. It is also mentioned that there is no requirement for every field in the XML file be matched with a column in the corresponding table. Fields which have no corresponding columns are skipped. Are my column names not matching the XML because of the XML prefixes and namespaces or am I barking up the wrong tree?

This question shows how to reference the same XML elements using PHP. Apparently prefix definition is required using registerXPathNamespace() in order to build an Xpath:

Do I need to build such an Xpath in SQL?

Perhaps I could remove the namespace/prefix data in PHP when I save the XML down using something like LIBXML_NSCLEAN; apparently this removes redundant namespace declarations. What qualifies as redundant?

Another option for removing namespaces seems to be XSL stylesheets. Per this question.

What is the best approach to solving this issue?

Community
  • 1
  • 1
DVCITIS
  • 1,067
  • 3
  • 16
  • 36
  • Any feedback people?! Have I made this too complicated? Don't make me put a bounty on it! If there is nothing wrong with the question, will some upvotes help get this answered? ;) I'll keep adding my research as I go. – DVCITIS Nov 19 '15 at 20:20

1 Answers1

1

Essentially, your XML is too complex (nested nodes, attributes, namespaces) to readily import into MySQL using LOAD XML. As your above link shows the statement supports only three different formats:

<row column1="value1" column2="value2" .../>

<row>
  <column1>value1</column1>
  <column2>value2</column2>
</row>

<row>
  <field name='column1'>value1</field>
  <field name='column2'>value2</field>
</row>

Therefore, you need to transform your raw XML into such format above aligned of course to your table's fields. Using XSLT can help tremendously. As information, XSLT is a special programming language that re-structures XML content to various forms for end-use needs. Like other general purpose languages including Java, C#, Python, and VB, PHP comes equipped with an XSLT 1.0 processor. You may need to enable the extension.

XSLT Script (save as .xsl or .xslt file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:doc="http://www.w3.org/2005/Atom"
                              xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
                              xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
                              exclude-result-prefixes="doc m d">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="node()"/>
    </xsl:copy>
</xsl:template>

<!-- PARSING NEEDED CONTENT -->
<xsl:template match="doc:entry">
    <row>
        <NEW_DATE><xsl:value-of select="doc:content/m:properties/d:NEW_DATE"/></NEW_DATE>
        <BC_1MONTH><xsl:value-of select="doc:content/m:properties/d:BC_1MONTH"/></BC_1MONTH>
        <BC_3MONTH><xsl:value-of select="doc:content/m:properties/d:BC_3MONTH"/></BC_3MONTH>
    </row>
</xsl:template>

<!-- REMOVE UNNEEDED NODES -->
<xsl:template match="doc:title|doc:id|doc:update|doc:link|doc:updated"/>

</xsl:stylesheet>

PHP Script (loading and processing XML and XSL content)

$doc = new DOMDocument();

// PARSING DIRECTLY FROM WEB PAGE
$doc->load('http://www.treasury.gov/resource-center/data-chart-center/interest-rates/pages/XmlView.aspx?data=yieldyear&year=2015');

$xsl = new DOMDocument;
$xsl->load('XSLTScript.xsl');

// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); 

// Transform XML source
$newXml = $proc->transformToXML($doc);

// Save output to file
$xmlfile = 'Output.xml';
file_put_contents($xmlfile, $newXml);

XML Output (now XML content can be used in LOAD XML to import into MySQL)

<?xml version="1.0" encoding="UTF-8"?>
<pre>
  <row>
    <NEW_DATE>2015-01-02T00:00:00</NEW_DATE>
    <BC_1MONTH>0.02</BC_1MONTH>
    <BC_3MONTH>0.02</BC_3MONTH>
  </row>
  <row>
    <NEW_DATE>2015-01-05T00:00:00</NEW_DATE>
    <BC_1MONTH>0.02</BC_1MONTH>
    <BC_3MONTH>0.03</BC_3MONTH>
  </row>
  <row>
    <NEW_DATE>2015-01-06T00:00:00</NEW_DATE>
    <BC_1MONTH>0.02</BC_1MONTH>
    <BC_3MONTH>0.03</BC_3MONTH>
  </row>
  ...
</pre>
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Just as a rhetorical Q... could he have loaded his XML dowload as-is ... if he were using PostgreSQL? – Paulb Nov 20 '15 at 19:04
  • `LOAD XML` is a MySQL functionality. PostgreSQL may have its own XML facility. However, xml as a data type is open-ended (left to the designer) which can have extensive nesting, while database tables are two-dimensional (row by column). So some type of parsing or query restructuring for complex xml may still be warranted for migration. – Parfait Nov 20 '15 at 20:09
  • Great answer, much appreciated! – DVCITIS Nov 21 '15 at 03:56