5

My goal is to encode a file and zip it in a folder in java. I have to use the Apache's Commons-codec library. I am able to encode and zip it and it works fine but when i decode it back to its original form, it looks like the file has not completely been encoded. Looks like a few parts are missing. Can anybody tell me why this happens?

I am also attaching the part of my code for your reference so that you can guide me accordingly.

private void zip() {
    int BUFFER_SIZE = 4096;
    byte[] buffer = new byte[BUFFER_SIZE];

    try {
        // Create the ZIP file
        String outFilename = "H:\\OUTPUT.zip";
        ZipOutputStream out = new ZipOutputStream(new FileOutputStream(
                outFilename));

        // Compress the files
        for (int i : list.getSelectedIndices()) {
            System.out.println(vector.elementAt(i));
            FileInputStream in = new FileInputStream(vector.elementAt(i));
            File f = vector.elementAt(i);

            // Add ZIP entry to output stream.
            out.putNextEntry(new ZipEntry(f.getName()));

            // Transfer bytes from the file to the ZIP file
            int len;

            while ((len = in.read(buffer)) > 0) {
                buffer = org.apache.commons.codec.binary.Base64
                        .encodeBase64(buffer);
                out.write(buffer, 0, len);

            }

            // Complete the entry
            out.closeEntry();
            in.close();

        }

        // Complete the ZIP file
        out.close();
    } catch (IOException e) {
        System.out.println("caught exception");
        e.printStackTrace();
    }
}
bluish
  • 26,356
  • 27
  • 122
  • 180
dmurali
  • 211
  • 1
  • 6
  • 14
  • Can you provide some samples that show what you put in, what you got out, and what you expected to get out? – Anonymoose Mar 13 '12 at 09:25
  • I don't think it has anything to do with your issue, but your `in.read` test should probably be `in.read(buffer) > -1`, as that's what the api javadoc states. The javadoc doesn't say that `0` means end-of-stream. http://docs.oracle.com/javase/1.4.2/docs/api/java/io/InputStream.html#read%28byte[]%29 – Paul Grime Mar 13 '12 at 09:25
  • 1
    If our hearts are pure, we can stamp out base64 in our lifetime. – Prof. Falken Mar 13 '12 at 09:28
  • I don't really understand why you want to base64 encode the data to put in the zip file, but other than that you have several other problems. You read len bytes, base64 encode them (now you have more than len bytes, and then you write len bytes, so the last part of your data is skipped. Also, when your sad does not fill the whole array (such as the last part of the file) you only want to encode the actual bytes or you will get trailing 0s. – Roger Lindsjö Mar 13 '12 at 09:31
  • Thank you all for your quick response! @Anonymoose: The original file which I wanted to encode was "1. http://illegalargumentexception.blogspot.com/2009/05/java-rough-guide-to-character-encoding.html#javaencoding_encodings 2. http://www.rgagnon.com/javadetails/java-0598.html 3 http://www.codecodex.com/wiki/Encode/Decode_to/from_Base64 4. http://www.javatips.net/blog/2011/08/how-to-encode-and-decode-in-base64-using-java" and I got the encoded text, where in the last link was missing, which in other words, when i decoded it, i could just the first 3 links and not the 4th one. – dmurali Mar 13 '12 at 09:37
  • @RogerLindsjö : I tried replacing it as in.read(buffer) > -1, but it does not work!! It just gives me the same error again :( – dmurali Mar 13 '12 at 09:38
  • So, why are you Base64 encoding the data INSIDE the ZIp again? Doesn't even make sense to me. – pap Mar 13 '12 at 09:40
  • What i mentioned earlier was just the part of my task. My complete task is to add and remove files from the JFileChosoer to my JList and select the ones which I have to Zip and code. When I click 'zip and code' button, it should automatically zip the selected files after encoding them, which means, when i open the zip folder, i should only have files which are encoded completely. (files can be any, txt or java, etc,..) – dmurali Mar 13 '12 at 09:42
  • @dmurali Yes, you get a new, larger, array, but you still only write len bytes of it. So if you read 3 bytes, encode it into 4 bytes, and then only save the first 3 bytes of the encoded message, then some data is lost. – Roger Lindsjö Mar 13 '12 at 09:42
  • @pap: So you mean to say that my folder itself should be an encoded one and not the files alone or have I mistaken? – dmurali Mar 13 '12 at 09:43
  • 1
    @dmurali I'm saying I don't see why you are bothering to base64 encode your data at all. After all, ZIP compression is really converting from one *binary* format to another. The encoding doesn't add anything, except more bytes and, possible, worse compression ration. Try without it. – pap Mar 13 '12 at 09:49
  • "I am able to encode and zip it" Why? Encoding it will make it bigger, zipping it will make it smaller again. What exactly is the point of all the extra I/O? – user207421 Mar 13 '12 at 10:05
  • @pap: yeah i see your point...!! now, is it possible to zip the files to a folder and then, do the encoding after that? I mean, for now, encoding is done first and then the encoded files are zipped into a folder. But can it be done vice-versa?? If yes, can you guide me how? Thanks in advance.!! – dmurali Mar 16 '12 at 11:55
  • @EJP: yes..i do get your point.! but now, i am lost about how to go about with it! – dmurali Mar 16 '12 at 11:55
  • If you had really got my point, you wouldn't be wanting to 'go about it' at all. You would be sending the requirement back where it came from, labelled 'pointless'. – user207421 Mar 17 '12 at 05:11
  • [See This link](http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html "See this linkquot;") – Ankur Loriya Mar 13 '12 at 09:35

3 Answers3

3

BASE64 encoded data are usually longer than source, however you are using the length of the source data to write encoded to output stream.

You have use size of the generated array instead of your variable len.

Second notice - do not redefine buffer each time you encode a byte. Just write result into output.

 while ((len = in.read(buffer)) > 0)  {                         
     byte [] enc = Base64.encodeBase64(Arrays.copyOf(buffer, len));
     out.write(enc, 0, enc.length);
 }

UPDATE: Use Arrays.copyOf(...) to set length of the input buffer for encoding.

bluish
  • 26,356
  • 27
  • 122
  • 180
DRCB
  • 2,111
  • 13
  • 21
  • HI..Now, again i am facing a problem in encoding-decoding a file. When a file is really small, then it is encoded n decoded properly. But it does not support larger files. For eg; the size of my file is just 7.28kb but when i decode it back to the original form, only the first half is properly decoded while, the next half returns me the encoded text back :( Do you think this is because of the buffer size? I ahve specified it as 'byte[]encodedBuf = new byte[1024];' – dmurali Mar 14 '12 at 18:58
  • I guess it depend on how you decode the file. Additionally I have found another problem in the code. Length of the source buffer is not specified by encoding. This will definitely cause problems if a source file length is not a multiple of your buffer length. – DRCB Mar 15 '12 at 08:57
  • Everything works fine now..!!! The corrected code is 'byte encodedBuf[] = new byte[(int) f.length()]; in.read(encodedBuf); byte enc [] = org.apache.commons.codec.binary.Base64.encodeBase64(encodedBuf); out.write(enc, 0, enc.length); in.close();' – dmurali Mar 15 '12 at 09:48
  • One last question, is it possible to zip the files to a folder and do the encoding after that? I mean, for now, encoding is done first and then the encoded files are zipped into a folder. But can it be done vice-versa?? If yes, can you tell me how? – dmurali Mar 15 '12 at 23:29
0

Your main problem is that base64 encoding can not be applied block-wise (especially not the apache-commons implementation). This problem is getting worse because you don't even know how large your blocks are as this depends on the bytes read by in.read(..).

Therefore you have two alternatives:

  1. Load the complete file to memory and then apply the base64 encoding.
  2. use an alternative Base64 encoder implementation that works stream-based (the Apache Batik project seems to contain such an implementation: org.apache.batik.util.Base64EncoderStream)
Robert
  • 39,162
  • 17
  • 99
  • 152
0

When you read the file content into buffer you get len bytes. When base64 encoding this you get more than len bytes, but you still only write len bytes to the file. This beans that the last part of your read chunks will be truncated.

Also, if your read does not fill the entire buffer you should not base64 encode more than len bytes as you will otherwise get trailing 0s in the padding of the last bytes.

Combining the information above this means that you must base64 encode the whole file (read it all into a byte[]) unless you can guarantee that each chunk you read can fit exactly into a base64 encoded message. If your files are not very large I would recommend reading the whole file.

A smaller problem is that when reading in your loop you should probably check for "> -1", not "> 0", but int his case it does not make a difference.

Roger Lindsjö
  • 11,330
  • 1
  • 42
  • 53