-1

I have the following nested XML, which I would like to stream parse with Node.js to a Postgres database. The XML is reduced to a reproducible example, but is in fact large.

<MarketDocument>
    <createdDateTime>2018-02-17T16:42:28Z</createdDateTime>
    <TimeSeries>
        <Type>A01<Type>
        <Period>
            <Point><position>1</position></Point>
            <Point><position>2</position></Point>
        </Period>
    </TimeSeries>
    <TimeSeries>
        <Type>B01<Type>
        <Period>
            <Point><position>3</position></Point>
            <Point><position>4</position></Point>
        </Period>
    </TimeSeries>
</MarketDocument>

Expected output: [["A01", 1], ["A01", 2], ["B01", 3], ["B01", 4]]

Main problem: iterating over the parent (<Type>). Haven't found good documentation on this problem. Would like to work along the approach by forrert

Question:
1) Do you have an idea to parse this correctly with Node.js?
2) Maybe there is another approach: let me know.


I basically need help with the following part:

var XmlStream = require('xml-stream');
var stream = fs.createReadStream('./here.xml'); // or stream directly from your online source
var xml = new XmlStream(stream);

xml.on('endElement: TimeSeries', function(item) {

    // PHP-code: How do you do this in nodejs
    foreach ($item->Period->Point as $point) {
    $position = $point->position;
    $array[] = "('$Type', '$position')";
    }

});

Your help would be appreciated!

Fierr
  • 185
  • 2
  • 10
  • Have you considered using a stream parser? – Brad Apr 19 '18 at 18:33
  • Yes and got it working but the main problem is the iteration part – Fierr Apr 19 '18 at 18:36
  • Not sure what problem you're having... using a stream parser, you can watch for that element type and output it as you go. If you're doing anything else, you should be querying your database after you have loaded the XML into it. XML is just a transfer format. – Brad Apr 19 '18 at 18:59
  • Sure, I will clarify. The problem is constructing the array in the expected form. I will add code – Fierr Apr 19 '18 at 19:10
  • So, your problem has nothing to do with XML parsing? You just want to do `arr.push([type, pos])`? – Brad Apr 19 '18 at 19:34
  • Thanks! You have put me in the right direction. Will try to to use Postgres [itself](https://stackoverflow.com/questions/37059187/convert-object-array-to-array-compatible-for-nodejs-pg-unnest) to unravel the elements. – Fierr Apr 19 '18 at 21:09

1 Answers1

1

All the approaches that were mentioned in forrert's answer seem fine to me.. If the xml is REALLY huge, you can split it to a few chunks, and work on it one chunk at a time, in order to not block the whole process

Roee
  • 34
  • 5
  • Thanks for you comment. But how do you iterate over e.g. A01 ? – Fierr Apr 19 '18 at 18:39
  • The built-in XML parser should be fine, though for heavy parsing it might be better do it in the server and then convert it to JSON.. Thought the question was more about approaches for large text parsing, do you need help in understanding how to parse it in the first place? – Roee Apr 19 '18 at 19:03
  • That's exactly right. I would like to do large text parsing on the server side. I have already written similar code in PHP, but would like to get it to work in NodeJS. Thanks though. I have added an example. – Fierr Apr 19 '18 at 19:26
  • Splitting an XML into chunks on your own seems like an overkill - would you implement a stateful token parser? There's a lot of modules that could stream the xml, like [sax stream](https://www.npmjs.com/package/sax-stream) – Michał Karpacki Apr 25 '18 at 20:23