I have a gigantic XML file (around 10 Gb) which i need to convert to CSV. Now this file would have information about numerous customers. I have to convert it a CSV format. The problem is that many customers will have extra fields which other customers wont, and some of the fields will be repeated. The example of XML is:
<customer>
<customerID>1</customerID>
<auc>
<algoId>0</algoId>
<kdbId>1</kdbId>
<acsub>1</acsub>
</auc>
</customer>
<customer>
<customerID>2</customerID>
<auc>
<algoId>0</algoId>
<kdbId>1</kdbId>
<acsub>1</acsub>
<extraBit>12345</extraBit>
</auc>
<auc>
<algoId>2</algoId>
<kdbId>3</kdbId>
<acsub>3</acsub>
<extraBit>67890</extraBit>
</auc>
<customOptions>
<odboc>0</odboc>
<odbic>0</odbic>
<odbr>1</odbr>
<odboprc>0</odboprc>
<odbssm>0</odbssm>
</customOptions>
</customer>
Now as you can see the First customer has only 1 auc block, but second one has 2, moreover it also has a extra tag in auc which is extraBit. Now the questions:
I should process one customer at a time (from one customer to /customer, and then so on) as 10 Gb atonce will crash the system.
I try to use XML TWIG in a loop and when i try to extraBit for Customer 1, it terminates the program for 'undefined value':
print $customer->first_child('extraBit')->text()
Can't call method "text" on an undefined value at xml-tags.pl line 50.
For the extra auc values for customer I want them to be output in the CSV file as:
customerID,algoId,kdbId,acsub,extraBit,algoId2,kdbId2,acsub2,extraBit2
1,0,1,1,,,,,,
2,0,1,1,1234,2,3,3,67890