6

I'm using GZIPOutputStream to gzip one xml file to gz file, but after zipping I find the extension name of the xml file (.xml) is missing in the gz file hierarchy. I need to keep the extension name because the zipped gz file will be used by third party system which expects getting a .xml file after unzipping gz file. Are there any solutions for this? My test code is:

public static void main(String[] args) {
    compress("D://test.xml", "D://test.gz");
}

private static boolean compress(String inputFileName, String targetFileName){
     boolean compressResult=true;
     int BUFFER = 1024*4;
     byte[] B_ARRAY = new byte[BUFFER]; 
     FileInputStream fins=null;
     FileOutputStream fout=null;
     GZIPOutputStream zout=null;
     try{
         File srcFile=new File(inputFileName);
         fins=new FileInputStream (srcFile);
         File tatgetFile=new File(targetFileName);
         fout = new FileOutputStream(tatgetFile);
         zout = new GZIPOutputStream(fout);
         int number = 0; 
         while((number = fins.read(B_ARRAY, 0, BUFFER)) != -1){
             zout.write(B_ARRAY, 0, number);  
         }
     }catch(Exception e){
         e.printStackTrace();
         compressResult=false;
     }finally{
         try {
            zout.close();
            fout.close();
            fins.close();
        } catch (IOException e) {
            e.printStackTrace();
            compressResult=false;
        }
     }
     return compressResult;
}
Eric Jiang
  • 567
  • 1
  • 7
  • 20
  • GZipOutputStream is not concerned with files, it just compresses the bytes you throw at it. The filename you save that stream to should be whatever you set in `targetFileName`. – Thilo Sep 27 '11 at 04:19
  • Yes, after running this code, what we can get is the file named "test.gz", and if we view this file by using a zip tool, like WinRAR, we can see a file named as "test" in it (not "test.xml"); If we unzip "test.gz" directly, what we will get is a file "test" but not "test.xml" which is the problem I mentioned. – Eric Jiang Sep 27 '11 at 04:32
  • 3
    Yes, that's exactly why you need to have your compressed file named `test.xml.gz` - try it on a Unix/Linux system. If you strip off the file extension and replace it with "gz", of course you'll lose the extension. – JW8 Sep 27 '11 at 04:38

7 Answers7

6

Maybe I'm missing something, but when I've gzipped files in the past, say test.xml, the output I get would be test.xml.gz. Perhaps if you changed the output filename to test.xml.tz you would still preserve your original file extension.

JW8
  • 1,496
  • 5
  • 21
  • 36
  • Please see my comments above, you misunderstood my meaning. Anyway, thanks. – Eric Jiang Sep 27 '11 at 04:34
  • 1
    @Eric: You need to call the file `test.xml.gz` if the original was `test.xml`. The file name is not stored anywhere else. – Thilo Sep 27 '11 at 04:39
  • @Eric - I think you really need to keep the "xml" in your filename. How else will the original file extension be pulled up? – JW8 Sep 27 '11 at 04:39
  • Yeah, thanks for clarification. But my problem still exists, the file we get from a third party system is named like "test.gz" (not test.xml.gz) but we can get test.xml after unzipping it, in other words, no matter file name I change, e.g. change it to "myTest.gz" I can still get "test.xml" by unzipping it or view the content by gzip tool. I want this result. So even I transfer "test.xml.gz" to this method, after I rename file to "test.gz", .xml will still be missing. Thanks. – Eric Jiang Sep 27 '11 at 05:32
  • 1
    With gzip, you're only compressing the file (http://en.wikipedia.org/wiki/Gzip), so to keep ".xml", you'll need to append ".gz" to the filename. To keep the entire filename, you'd have to archive and compress (http://en.wikipedia.org/wiki/List_of_archive_formats). As one of the other posters have mentioned, you could tar the file and then gzip. – JW8 Sep 27 '11 at 05:41
  • Is it possible to tar the file by using java code if not using third party tool? How? – Eric Jiang Sep 27 '11 at 05:51
  • It's definitely possible. Check out this article to find out how to create a tar.gz file in one pass: http://www.selikoff.net/2010/07/28/creating-a-tar-gz-file-in-java/ – JW8 Sep 27 '11 at 16:29
4

Not sure what the problem is here, you are calling your own compress function

private static boolean compress(String inputFileName, String targetFileName)

with the following arguments

compress("D://test.xml", "D://test.gz");

Quite obviously you are going to lose the .xml portion of the filename, you never pass it into your method.

Perception
  • 79,279
  • 19
  • 185
  • 195
3

Your code is perfectly fine. give the output file names as "D://test.xml.gz" you missed the file extension(.xml).

   Ex: compress("D://test.xml", "D://test.xml.gz");
Satya
  • 8,146
  • 9
  • 38
  • 43
2

I also had the same issue, I found that (apache) commons-compress has a similar class - GzipCompressorOutputStream that can be configured with parameters.

        final File compressedFile = new File("test-outer.xml.gz");
        final GzipParameters gzipParameters = new GzipParameters();
        gzipParameters.setFilename("test-inner.xml");
        final GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(new FileOutputStream(compressedFile), gzipParameters);

Dependency:

        <dependency>
          <groupId>org.apache.commons</groupId>
          <artifactId>commons-compress</artifactId>
          <version>1.8</version>
        </dependency>
Andreas
  • 326
  • 2
  • 5
1

I created a copy of GZIPOutputStream and changed the code to allow for a different filename "in the gzip":

private final byte[] header = {
    (byte) GZIP_MAGIC,                // Magic number (short)
    (byte)(GZIP_MAGIC >> 8),          // Magic number (short)
    Deflater.DEFLATED,                // Compression method (CM)
    8,                                // Flags (FLG)
    0,                                // Modification time MTIME (int)
    0,                                // Modification time MTIME (int)
    0,                                // Modification time MTIME (int)
    0,                                // Modification time MTIME (int)
    0,                                // Extra flags (XFLG)
    0                                 // Operating system (OS)
};

private void writeHeader() throws IOException {
    out.write(header);
    out.write("myinternalfilename".getBytes());
    out.write(new byte[] {0});
}

Info about gzip format: http://www.gzip.org/zlib/rfc-gzip.html#specification

rretzbach
  • 744
  • 1
  • 6
  • 16
  • Working link to gzip format: https://www.ietf.org/rfc/rfc1952.txt In the header, the flag FNAME (0x3) is set to indicate a filename is present. Note regarding the filename, the standard says it must be in ISO-8859-1 (LATIN-1), and lowercased if file system is case-insensitive. If not created from a file (e.g. stdin), the filename should not be set. – timh Jan 20 '23 at 09:08
1

You can also use an ArchiveOutput stream (like Tar) before GZipping it.

YMomb
  • 2,366
  • 1
  • 27
  • 36
1

Use the ZipOutputStream with ZipEntry instead of GZipOutputStream. so that it will keep the original file extension.

Sample code as below..

ZipOutputStream zipOutStream = new ZipOutputStream(new FileOutputStream(zipFile));
    FileInputStream inStream = new FileInputStream(file); // Stream to read file
    ZipEntry entry = new ZipEntry(file.getPath()); // Make a ZipEntry
    zipOutStream.putNextEntry(entry); // Store entry
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
santharao
  • 11
  • 2