0

How would I read multiple XML files from an input stream in Java and write them as XML files?

I have this:

InputStream is = new GZIPInputStream(new FileInputStream(file));

Edit: I have a tar.gz file say, xmls.tar.gz that is "file" that contains multiple XML files. When I convert it to a string using:

public static String convertStreamToString(java.io.InputStream is) {
        java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
        return s.hasNext() ? s.next() : "";
    }

I get all of the XML files chained together, with file information as well. On System.out.println I get(this is just the beginning of one file):

blah.xml    60      0      0        2300 12077203627  10436 0ustar     0      0 <?xml version="1.0"...

ANSWER:

This worked great for me, following on Keith's suggestion to use Apache Compress and io:

http://thinktibits.blogspot.com/2013/01/read-extract-tar-file-java-example.html

import java.io.*;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.io.IOUtils;
public class unTar {  
        public static void main(String[] args) throws Exception{
                /* Read TAR File into TarArchiveInputStream */
                TarArchiveInputStream myTarFile=new TarArchiveInputStream(new FileInputStream(new File("tar_ball.tar")));
                /* To read individual TAR file */
                TarArchiveEntry entry = null;
                String individualFiles;
                int offset;
                FileOutputStream outputFile=null;
                /* Create a loop to read every single entry in TAR file */
                while ((entry = myTarFile.getNextTarEntry()) != null) {
                        /* Get the name of the file */
                        individualFiles = entry.getName();
                        /* Get Size of the file and create a byte array for the size */
                        byte[] content = new byte[(int) entry.getSize()];
                        offset=0;
                        /* Some SOP statements to check progress */
                        System.out.println("File Name in TAR File is: " + individualFiles);
                        System.out.println("Size of the File is: " + entry.getSize());                  
                        System.out.println("Byte Array length: " + content.length);
                        /* Read file from the archive into byte array */
                        myTarFile.read(content, offset, content.length - offset);
                        /* Define OutputStream for writing the file */
                        outputFile=new FileOutputStream(new File(individualFiles));
                        /* Use IOUtiles to write content of byte array to physical file */
                        IOUtils.write(content,outputFile);              
                        /* Close Output Stream */
                        outputFile.close();
                }               
                /* Close TarAchiveInputStream */
                myTarFile.close();
        }
}
John
  • 3,037
  • 8
  • 36
  • 68

1 Answers1

2

After un-compressing (gzip) you still need to un-tar. The java JDK doesn't have a built in API for tar, but there are several available from third parties. See this answer: How do I extract a tar file in Java?

Community
  • 1
  • 1
Keith
  • 4,144
  • 1
  • 19
  • 14
  • Isn't my InputStream is = new GZIPInputStream(new FileInputStream(file)); code exactly what the answer in the question you linked to suggests? – John Jul 17 '13 at 13:10
  • No, read the answers other than the accepted/first. The first answer, and your GzipInputStream, just gives you one stream of bytes for all files in the tar. That is ok if you want to parse those bytes yourself to figure out where each compoent of the tar ends, etc. Better to use a higher level API that lets you loop over objects like "TarEntry", and get an input stream from each of those, representing each (in your case) XML file in the tar. The later answers show how to do this with code from various libraries. – Keith Jul 17 '13 at 13:17