1

I am using Apache Kafka to read in an multiple xml files. I want to convert the xml files into a flat file (csv file or text file). I have an example output below:

I think converting xml into dom is a solution or using Jackson-xml data converter?

Can anyone comment on the best solution to achieve this? Thanks!

Input 1:

<?xml version="1.0" encoding="UTF-8"?>
<customer>
   <id>123</id>
   <firstName>Jane</firstName>
   <phoneNumbers type="work">555-1111</phoneNumbers>
</customer>

Input 2:

   <?xml version="1.0" encoding="UTF-8"?>
    <customer>
       <id>1234</id>
       <firstName>Bob</firstName>
       <phoneNumbers type="work">555-1111</phoneNumbers>
    </customer>

Output:

<?xml version="1.0" encoding="UTF-8"?><customer><id>123</id><firstName>Jane</firstName><phoneNumbers type="work">555-1234</phoneNumbers></customer>

<?xml version="1.0" encoding="UTF-8"?><customer><id>1234</id><firstName>Bob</firstName><phoneNumbers type="work">555-1111</phoneNumbers></customer>
Defcon
  • 807
  • 3
  • 15
  • 36

2 Answers2

1

Good question. One way to do it is bash... look below

#!/bin/bash

>combined.csv
for xml in *.xml
do
  echo "Processing $xml";
  id=$({ xmllint --xpath "string(//customer/id)" $xml; echo ","; } | tr "\n" " ");
  firstname=$({ xmllint --xpath "string(//customer/firstName)" $xml; echo ","; } | tr "\n" " ");
  phonenumber=$(xmllint --xpath "string(//customer/phoneNumbers)" $xml);
  line="${id}${firstname}${phonenumber}\n"
  printf "$line" >> combined.csv
done
Yixin Xia
  • 28
  • 4
  • Oh interesting solution. How do I iterate through a whole series of xml without hardcoding every field? The real file is really long and has lots of fields. – Defcon Apr 12 '16 at 05:31
  • I would try to use xpath to get all the names of a node, and iterate over that. – Yixin Xia Apr 12 '16 at 05:31
0

Since you tagged your question with java I'll assume you're using Kafka Producer java client.

If that's the case then you can do the conversion in your Producer implementation, using something like this.

Community
  • 1
  • 1
Marko Bonaci
  • 5,622
  • 2
  • 34
  • 55