8

I have a Java program which searches for a folder with the date of yesterday and compresses it to a 7zip file and deletes it at the end. Now I have noticed that the generated 7zip archive files by my program are way too big. When I use a program like 7-Zip File Manager to compress my files it generates an archive which is 5 kb big while my program generates an archive which is 737 kb big for the same files (which have a 873 kb size). Now I am afraid that my program does not compress it to a 7zip file but do a usual zip file. Is there a way to change something in my code so that it generates a smaller 7zip file like 7-Zip File Manager would do it?

package SevenZip;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.concurrent.TimeUnit;

import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZOutputFile;

public class SevenZipUtils {

    public static void main(String[] args) throws InterruptedException, IOException {

        String sourceFolder = "C:/Users/Ferid/Documents/Dates/";
        String outputZipFile = "/Users/Ferid/Documents/Dates";
        int sleepTime = 0;
        compress(sleepTime, outputZipFile, sourceFolder);
    }

    public static boolean deleteDirectory(File directory, int sleepTime) throws InterruptedException {
        if (directory.exists()) {
            File[] files = directory.listFiles();
            if (null != files) {
                for (int i = 0; i < files.length; i++) {
                    if (files[i].isDirectory()) {
                        deleteDirectory(files[i], sleepTime);
                        System.out.println("Folder deleted: " + files[i]);
                    } else {
                        files[i].delete();
                        System.out.println("File deleted: " + files[i]);
                    }
                }
            }
        }
        TimeUnit.SECONDS.sleep(sleepTime);
        return (directory.delete());
    }

    public static void compress(int sleepTime, String outputZipFile, String sourceFolder)
            throws IOException, InterruptedException {

        // finds folder of yesterdays date
        final Calendar cal = Calendar.getInstance();
        cal.add(Calendar.DATE, -1); // date of yesterday
        String timeStamp = new SimpleDateFormat("yyyyMMdd").format(cal.getTime()); // format the date
        System.out.println("Yesterday was " + timeStamp);

        if (sourceFolder.endsWith("/")) { // add yesterday folder to sourcefolder path
            sourceFolder = sourceFolder + timeStamp;
        } else {
            sourceFolder = sourceFolder + "/" + timeStamp;
        }

        if (outputZipFile.endsWith("/")) { // add yesterday folder name to outputZipFile path
            outputZipFile = outputZipFile + " " + timeStamp + ".7z";
        } else {
            outputZipFile = outputZipFile + "/" + timeStamp + ".7z";
        }

        File file = new File(sourceFolder);

        if (file.exists()) {
            try (SevenZOutputFile out = new SevenZOutputFile(new File(outputZipFile))) {
                addToArchiveCompression(out, file, ".");
                System.out.println("Files sucessfully compressed");

                deleteDirectory(new File(sourceFolder), sleepTime);
            }
        } else {
            System.out.println("Folder does not exist");
        }
    }

    private static void addToArchiveCompression(SevenZOutputFile out, File file, String dir) throws IOException {
        String name = dir + File.separator + file.getName();
        if (file.isFile()) {
            SevenZArchiveEntry entry = out.createArchiveEntry(file, name);
            out.putArchiveEntry(entry);

            FileInputStream in = new FileInputStream(file);
            byte[] b = new byte[1024];
            int count = 0;
            while ((count = in.read(b)) > 0) {
                out.write(b, 0, count);
            }
            out.closeArchiveEntry();
            in.close();
            System.out.println("File added: " + file.getName());
        } else if (file.isDirectory()) {
            File[] children = file.listFiles();
            if (children != null) {
                for (File child : children) {
                    addToArchiveCompression(out, child, name);
                }
            }
            System.out.println("Directory added: " + file.getName());
        } else {
            System.out.println(file.getName() + " is not supported");
        }
    }
}

I am using the Apache Commons Compress library

EDIT: Here is a link where I have some of the Apache Commons Compress code from.

Mad Scientist
  • 857
  • 4
  • 16
  • 43
  • You have more than a 150-fold difference in file size. That could not plausibly result from using regular ZIP format instead of 7Z format. It's large enough that I think it unlikely to be attributable to using compressed entries in one case but not the other, though we don't have enough data to rule that out. The most likely issue here is that the (original) contents of the archives you are comparing differ. – John Bollinger Jan 07 '19 at 14:14
  • Yes, John Bollinger is right, I would compare java with 7z, is the unpacked image different in size (extra jpeg compression, resizing), is there an extra .thumbs file created? – Joop Eggen Jan 07 '19 at 14:20
  • That may sound like a stupid question, but can you extract the 5kb archive correctly? – jhamon Jan 07 '19 at 14:24
  • @jhamon yes I have tried it now and my original folder which is 873 kb big was extracted without any problems just like when I extract the one which was generated by my java program so both extract the same without any problems – Mad Scientist Jan 07 '19 at 14:27
  • 1
    Didn't work with *7-zip*, but 873 kB, compressed to 737 for *zip* and to 5 kB for *7-zip* seems a bit unreasonable. How many files are in that dir? In how many sub-dirs? What type of files are they? – CristiFati Jan 14 '19 at 09:58
  • @CristiFati 7zip has a very good compression rate so this is usual for 7zip. In that dir are 28 xml files and 24 sub-dirs and each sub-dir has 48 xml files – Mad Scientist Jan 14 '19 at 10:48
  • 1
    7z performs better than zip but not *that* much. However it uses solid compression by default, which is a big saver. I know it's too late, but you can emulate solid compression in zip format using two-pass zip compression, see the edit in my answer (for posterity ;)). – Matthieu Jan 16 '19 at 10:35

3 Answers3

8

Commons Compress is starting a new block in the container file for each archive entry. Note the block counter here:

block-per-file

Not quite the answer you were hoping for, but the docs say it doesn't support "solid compression" - writing several files to a single block. See paragraph 5 in the docs here.

A quick look around found a few other Java libraries that support LZMA compression, but I couldn't spot one that could do so within the parent container file format for 7-Zip. Perhaps someone else knows of an alternative...

It sounds like a normal zip file format (e.g. via ZipOutputStream) is not an option?

df778899
  • 10,703
  • 1
  • 24
  • 36
  • No, a normal zip file format would be too big sadly – Mad Scientist Jan 11 '19 at 13:31
  • 1
    A normal zip file cannot support solid compression because the format doesn't allow it. – ggf31416 Jan 16 '19 at 01:17
  • @ggf31416 you can emulate solid compression by running two passes: first pass creates a zip with all files an *no compression*, second pass compresses that single zip file with *max compression* (see the last paragraph of [my answer)(https://stackoverflow.com/a/54182607/1098603)). That basically is tgz... – Matthieu Jan 16 '19 at 10:37
  • 1
    @Matthieu Yes, that's a good observation, but you can do the tar + compression or the no_compression + compression solid emulation with almost any format, and the small 32KB "dictionary" for standard zip (anything else is not standard zip deflate anymore) means that tar.bz2 or tar.xz or 7z without compression + 7z with compression would have better results – ggf31416 Jan 16 '19 at 16:51
5

I don't have enough rep to comment anymore so here are my thoughts:

  • I don't see where you set the compression ratio so it could be that SevenZOutputFile uses no (or very low) compression. As @CristiFati said, the difference in compression is odd, especially for text files
  • As noted by @df778899, there is no support for solid compression, which is how the best compression ratio is achieved, so you won't be able to do as well as the 7z command line

That said, if zip really isn't an option, your last resort could be to call the proper command line directly within your program.

If pure 7z is not mandatory, another option would be to use a "tgz"-like format to emulate solid compression: first compress all files to a non-compressed file (e.g. tar format, or zip file with no compression), then compress that single file in zip mode with standard Java Deflate algorithm. Of course that will be viable only if that format is recognized by further processes using it.

Matthieu
  • 2,736
  • 4
  • 57
  • 87
5

Use 7-Zip file archiver instead, it compresses 832 KB file to 26.0 KB easily:

  1. Get its Jar and SDK.
  2. Choose LZMA Compression .java related files.
  3. Add Run arguments to project properties: e "D:\\2017ASP.pdf" "D:\\2017ASP.7z", e stands for encode, "input path" "output path".
  4. Run the project [LzmaAlone.java].

Results

Case1 (.pdf file ): From 33,969 KB to 24,645 KB.

Case2 (.docx file ): From 832 KB to 26.0 KB.

TiyebM
  • 2,684
  • 3
  • 40
  • 66
  • correct and this https://commons.apache.org/proper/commons-compress/apidocs/index.html?org/apache/commons/compress/compressors/xz/XZCompressorOutputStream.html can be also used – Saqib Javed Jan 16 '19 at 07:09