I need to convert and compile multiple XML files (in a standard format) to a single CSV file. Because I also need to perform computations on some of the imported elements, XSLT is not an option (Stackoverflow: XML to CSV Using XSLT) unless I perform computations on each converted CSV file.
XPath has been suggested as an alternative to SAX2, but because the final CSV output is large (based on over 100 XML files) I am hesitant to use arrays. (Stackoverflow: Convert XML file to CSV)
Using SAX2 I have been somewhat successful in extracting the tag elements.
If I could append output - for each individual file - to the final CSV output I assume that I would have a more memory stable application.
I hope others would benefit from knowing the answer to the question: How can I efficiently handle computations in conjunction with XML-CSV conversions for large-scale data?
XML file 1
<element id="1">
<info>Yes</info>
<startValue>0</startValue> <!-- Value entered twice, ignore--!>
<startValue>256</startValue>
<stopValue>64</stopValue>
</element>
<element id="2">
<info>No</info>
<startValue>50</startValue>
<stopValue>25</stopValue>
</element>
<....
XML file 2
<element id="1">
<info>No</info>
<startValue>128</startValue>
<stopValue>100</stopValue>
</element>
<....
Pseudopseudocode
for all files
get ID
get info
for all stop and start values
ignore wrong values: use counter
difference[] = startValue(i) - stopValues(j) = 196, 28
append (ID, info and difference) to file "outputfile.csv"
CSV Eutput Example
File ID Info Difference Etc
_________________________________________________
0 1 Yes 196 ....
0 2 No 25 ....
1 1 No 28 ....
. ... ... ....
. ... ... ....
nfiles