18

I need help understanding the steps involved in converting an XML file into a CSV file using java. Here is an example of an XML file

<?xml version="1.0"?>
<Sites>
<Site id="101" name="NY-01" location="New York">
    <Hosts>
        <Host id="1001">
           <Host_Name>srv001001</Host_Name>
           <IP_address>10.1.2.3</IP_address>
           <OS>Windows</OS>
           <Load_avg_1min>1.3</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1002">
           <Host_Name>srv001002</Host_Name>
           <IP_address>10.1.2.4</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>1.4</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1003">
           <Host_Name>srv001003</Host_Name>
           <IP_address>10.1.2.5</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>3.3</Load_avg_1min>
           <Load_avg_5min>1.6</Load_avg_5min>
           <Load_avg_15min>1.8</Load_avg_15min>
        </Host>
        <Host id="1004">
           <Host_Name>srv001004</Host_Name>
           <IP_address>10.1.2.6</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>2.3</Load_avg_1min>
           <Load_avg_5min>4.5</Load_avg_5min>
           <Load_avg_15min>4.2</Load_avg_15min>
        </Host>     
    </Hosts>
</Site>
</Sites>

and here is the resulting CSV file.

site_id, site_name, site_location, host_id, host_name, ip_address, operative_system, load_avg_1min, load_avg_5min, load_avg_15min
101, NY-01, New York, 1001, srv001001, 10.1.2.3, Windows, 1.3, 2.5, 1.2
101, NY-01, New York, 1002, srv001002, 10.1.2.4, Linux, 1.4, 2.5, 1.2
101, NY-01, New York, 1003, srv001003, 10.1.2.5, Linux, 3.3, 1.6, 1.8
101, NY-01, New York, 1004, srv001004, 10.1.2.6, Linux, 2.3, 4.5, 4.2

I was thinking of using a DOM parser to read the xml file. The problem I have with that is I would need to specify specific elements in to code by name, but I want it to be able to parse it without doing that.

Are there any tools or libraries in java that would be able to help me achieve this.

If I have a XML file of this format below and want to add the value of the InitgPty in the same row with MSgId (Pls note :InitgPty is in the next tag level, so it prints the value in the next row)

<?xml version="1.0"?>
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>XYZ07/ABC</MsgId>
<NbOfTxs>100000</NbOfTxs>
<InitgPty>
<Nm>XYZ</Nm>
</InitgPty>
Emre801
  • 3,043
  • 5
  • 25
  • 28

6 Answers6

35

here's a working example, data.xml has your data:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

class Xml2Csv {

    public static void main(String args[]) throws Exception {
        File stylesheet = new File("src/main/resources/style.xsl");
        File xmlSource = new File("src/main/resources/data.xml");

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(xmlSource);

        StreamSource stylesource = new StreamSource(stylesheet);
        Transformer transformer = TransformerFactory.newInstance()
                .newTransformer(stylesource);
        Source source = new DOMSource(document);
        Result outputTarget = new StreamResult(new File("/tmp/x.csv"));
        transformer.transform(source, outputTarget);
    }
}

style.xsl

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
<xsl:for-each select="//Host">
<xsl:value-of select="concat(Host_Name,',',IP_address,',',OS,Load_avg_1min,',',Load_avg_5min,',',Load_avg_15min,'&#xA;')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

output:

Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
srv001001,10.1.2.3,Windows1.3,2.5,1.2
srv001002,10.1.2.4,Linux1.4,2.5,1.2
srv001003,10.1.2.5,Linux3.3,1.6,1.8
srv001004,10.1.2.6,Linux2.3,4.5,4.2
Guy Gavriely
  • 11,228
  • 6
  • 27
  • 42
  • I tried to work around this code to build only one csv, but couldnt. Any hints on how I can do that? I have more than a 100 XML files which I need to convert to a single csv file. – user3270763 Feb 13 '17 at 22:29
  • I tried this but i am getting name of cloumns only...What should i change? – ZIA ANSARI Nov 11 '18 at 10:08
  • so the style.xsl file, if you don't have it, will you need to create it manually? What happen if you don't want the headers in your CSV? How that could be done? – IoT user Mar 09 '20 at 10:55
  • @iot-user, Editing the style.xsl should get you what you want. Remove the header line in it. – codester Sep 16 '20 at 15:20
  • How to fetch sub-nodes data to csv Example: 0z342bvff world_catalon Catalon – Nagarjuna Yalamanchili Feb 05 '21 at 09:29
3

Your best best is to use XSLT to "transform" the XML to CSV. There are some Q/As on so (like here) that cover how to do this. The key is to provide a schema for your source data so the XSLT transform process knows how to read it so it can properly format the results.

Then you can use Xalan to input the XML, read the XSLT and output your results.

Community
  • 1
  • 1
Pedantic
  • 5,032
  • 2
  • 24
  • 37
2

Three steps:

  1. Parse the XML file into a java XML library object.
  2. Retrieve relevant data from the object for each row.
  3. Write the results to a text file using native java functions, saving with *.csv extension.
Community
  • 1
  • 1
Jono
  • 3,949
  • 4
  • 28
  • 48
1

The answer has already been provided by Pedantic (using the DOM-like approach {Document Object Model}) and Jono (with the SAX-like approach this time) in January.

My opinion is that both methods work well for small files but the latter works better with big XML files. You didn't mention the actual size of your XML files but you should take this into account.

Whatever method is used a specific program (which would detect special tags tailored to your local XML) will be easier to write but won't work without code adaptations for another XML flavor, while a more generic program will be harder to devise but will work for all XML files. You said you wanted to be able to parse a file without specifying specific element names so I guess the generic approach is what you prefer, and I agree with that, but please note that it's easier said than done. Indeed, I had the same problem on january too, implying this time a big XML file (>>100Mo) and I was surprised that nothing was available over the Internet so far. Turning frustration into something better is always a good thing so I decided to deal with that specific problem in the most generic way by myself, with a special concern for the big-XML-file-issue.

You might be interested to know that the generic Java library I wrote, which is now published as free software, converted your XML file into CSV the way you expected (in -x -u mode {please refer to the documentation for further information}).

So the answer to the last part of your question is: yes, there is at least one library which will help you achieve your goal, mine, which is named "XML2CSV-Generic-Converter". There might be other ones of course, and better ones certainly, but I couldn't pick any decent (free) one by myself.

I won't provide any link here to comply with Peter Foti 's judicious remark - but if you key "XML2CSV-Generic-Converter" in your favorite search engine you should find it easily.

Lochrann
  • 11
  • 2
0

your file looks really flat and simple. You don't necessarily need an XML parser to convert it. Just parse it with LineNumberReader.readLine() and use regexp to extract specific fields.

Another option is to use StAX, a streaming API for XML processing. It's pretty simple and you don't need to load the whole document in RAM.

injecteer
  • 20,038
  • 4
  • 45
  • 89
0

http://beanio.org/2.1/docs/reference/index.html#Records This is one of the Quick and robust solution.

Amol
  • 23
  • 3