15

I have a fat/uber JAR generated by Gradle Shadow plugin. I often need to send the fat JAR over network and therefore, it is convenient for me to send only delta of the file instead of cca 40 MB of data. rsync is a great tool for this purpose. However, a small change in my source code leads to a large change in final fat JAR and consequently rsync is not helping as much as it could.

Can I convert the fat JAR to rsync-friendly JAR?

My ideas of a solution/workarounds:

  • Put the heavy weight on rsync and tell it somehow that it works with a compressed file (I didn't find any way to do it).
  • Convert non-rsyncable jar to rsyncable jar
  • Tell Gradle Shadow to generate rsyncable jar (not possible at the moment)

Possibly related questions:

Community
  • 1
  • 1
MartyIX
  • 27,828
  • 29
  • 136
  • 207
  • Commenting in case someone answers. I need to know that as well. – Y.Kaan Yılmaz Apr 04 '16 at 07:00
  • It is an option to send the JAR unpacked with rsync and zip it again on the remote maschine? This way rsync should be able to have low traffic. – Steffen Harbich Apr 07 '16 at 07:18
  • Well, it is an option. I would prefer to prepare everything on source machine though. I think, this solution would also require quite a lot of unnecessary I/O disk operations. – MartyIX Apr 07 '16 at 07:23
  • As I'm using large jar/war builds that out of the box allow rsync to achieve major speed-up due to saving on transfer (rsyncable as you called it), I get to the impression that you actually have a problem with your build process. Did you verify that the meta data on "unchanged" files ist truly kept unchanged (e.g. last modification time on classes is time of first build after last change and not just time of last build? – rpy Apr 10 '16 at 07:55

4 Answers4

4

There are two ways to do this both of which involve turning compression off. Gradle first then turn it off using the jar method...

You can do this using gradle (this answer actually came from the OP)

shadowJar {
    zip64 true
    entryCompression = org.gradle.api.tasks.bundling.ZipEntryCompression.STORED
    exclude 'META-INF/*.RSA', 'META-INF/*.SF','META-INF/*.DSA'
    manifest {
        attributes 'Main-Class': 'com.my.project.Main'
    }
}

with

jar {
    manifest {
        attributes(
                'Main-Class': 'com.my.project.Main',
        )
    }
}

task fatJar(type: Jar) {
    manifest.from jar.manifest
    classifier = 'all'
    from {
        configurations.runtime.collect { it.isDirectory() ? it : zipTree(it) }
    } {
        exclude "META-INF/*.SF"
        exclude "META-INF/*.DSA"
        exclude "META-INF/*.RSA"
    }
    with jar
}

The key thing here is that compression has been turned off ie

org.gradle.api.tasks.bundling.ZipEntryCompression.STORED

You can find the docs here

https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/bundling/ZipEntryCompression.html#STORED

Yes you can speed it up by about 40% on a new archive and by more than 200% on a jar archive you've already rsync'd. The trick is to not compress the jar so you can take advantage of rsyncs chunking algorithm.

I used the following commands to compress a directory with a lot of class files...

jar cf0 uncompressed.jar .
jar cf  compressed.jar   .

This created the following two jars...

-rw-r--r--  1 rsync jar    28331212 Apr 13 14:11 ./compressed.jar
-rw-r--r--  1 rsync jar    38746054 Apr 13 14:10 ./uncompressed.jar

Note that the size of the uncompressed Jar is about 10MB larger.

I then rsync'd these files and timed them using the following commands. (Note, even turning on compression for the compressed file had little effect, I'll explain later).

Compressed Jar

time rsync -av -e ssh compressed.jar jar@rsync-server.org:/tmp/

building file list ... done
compressed.jar

sent 28334806 bytes  received 42 bytes  2982615.58 bytes/sec
total size is 28331212  speedup is 1.00

real  0m9.208s
user  0m0.248s
sys 0m0.483s

Uncompressed Jar

time rsync -avz -e ssh uncompressed.jar jar@rsync-server.org:/tmp/

building file list ... done
uncompressed.jar

sent 11751973 bytes  received 42 bytes  2136730.00 bytes/sec
total size is 38746054  speedup is 3.30

real  0m5.145s
user  0m1.444s
sys 0m0.219s

We have gained a speedup of nearly 50%. This at least speeds up the rsync and we get a good boost but what about subsequent rsyncs where a small change has been made.

I removed one class file from the directory that was 170 bytes in size recreated the jars mow they are this size..

-rw-r--r--  1 rsycn jar  28330943 Apr 13 14:30 compressed.jar
-rw-r--r--  1 rsync jar  38745784 Apr 13 14:30 uncompressed.jar

Now the timings are very different.

Compressed Jar

building file list ... done
compressed.jar

sent 12166657 bytes  received 31998 bytes  2217937.27 bytes/sec
total size is 28330943  speedup is 2.32

real  0m5.435s
user  0m0.378s
sys 0m0.335s

Uncompressed Jar

building file list ... done
uncompressed.jar

sent 220163 bytes  received 43624 bytes  175858.00 bytes/sec
total size is 38745784  speedup is 146.88

real  0m1.533s
user  0m0.363s
sys 0m0.047s

So we can speed up rsyncing large jar files a lot using this method. The reason for this is related to information theory. When you compress data it in effect removes everything that's common from the data ie what you're left with looks very much like random data, the best compressors remove more of this information. A small change to any of the data and most compression algorithms have a dramatic effect on the output of the data.

The Zip algorithm is effectively making it harder for rsync to find checksums that are the same between the server and client and this means it needs to transfer more data. When you uncompress it you're letting rsync do what it's good at, send less data to sync the two files.

Harry
  • 11,298
  • 1
  • 29
  • 43
2

As far as I know, rsyncable gzip works by reseting the Huffman tree and padding to byte boundaries every 8192 bytes of compressed data. This avoids long range side effect on the compression (rsync take care of shifted data blocks if they are at least byte aligned)

In this sense, a jar containing small files (less than 8192 bytes) is already rsyncable, because each file is compressed separately. As a test you could use jar's -0 option (no compression) to check if it helps rsync, but I think it won't.

To improve the rsyncability you need to (at least):

  • Make sure the files are stored in the same order.
  • Make sure the meta data associated to unchanged files are also unchanged, as each file has a local file header. For example the last modification time is problematic for .class files.
    I am not sure for jar, but zip allows extra fields, some of which may prevent rsync matches, e.g. the last acces time for the unix extension.

Edit : I did some tests with the following commands :

FILENAME=SomeJar.jar

rm -rf tempdir
mkdir tempdir

unzip ${FILENAME} -d tempdir/

cd tempdir

# set the timestamp to 2000-01-01 00:00
find . -print0 | xargs --null touch -t 200001010000

# normalize file mode bits, maybe not necessary
chmod -R u=rwX,go=rX .

# sort and zip files, without extra
find . -type f -print | sort | zip ../${FILENAME}_normalized  -X -@

cd ..
rm -rf tempdir

rsync stats when the first file contained in the jar / zip is removed :

total: matches=1973  hash_hits=13362  false_alarms=0 data=357859
sent 365,918 bytes  received 12,919 bytes  252,558.00 bytes/sec
total size is 4,572,187  speedup is 12.07

when the first file is removed and every timestamp is modified :

total: matches=334  hash_hits=124326  false_alarms=4 data=3858763
sent 3,861,473 bytes  received 12,919 bytes  7,748,784.00 bytes/sec
total size is 4,572,187  speedup is 1.18

So there is a significant difference, but not as much as I expected.

It also seems that changing the file mode does not impact the transfert (maybe because it is stored in the central directory ?)

bwt
  • 17,292
  • 1
  • 42
  • 60
  • Thank you. I have a basic understanding how rsyncable works. Unfortunately, this is not really answering my question because you don't say how I can do what you propose. I appreciate your input. – MartyIX Apr 07 '16 at 12:20
  • The simplest solution I can think of is to unpack the jar, change the timestamps, and repack it sorted. It depends on what OS you use, e.g. for linux it would be based on `unzip`, `touch` and `zip`. It is not difficult but I find a bit strange that there is no built tool that already does that – bwt Apr 07 '16 at 13:11
  • @btw: I'm using Linux. Can you please show a working example of your approach? – MartyIX Apr 12 '16 at 08:46
  • I added test results – bwt Apr 12 '16 at 14:05
1

Let's take one step back; if you do not create large jars, this ceases to be a problem.

So, if you deploy your dependency jars separately, and you don't jar them into a single fat jar, you've also solved the problem here.

To do that, let's say you have:

  • /foo/yourapp.jar
  • /foo/lib/guava.jar
  • /foo/lib/h2.jar

Then, put in the META-INF/MANIFEST.MF file of yourapp.jar the following entry:

Class-Path: lib/guava.jar lib/h2.jar

And now you can just run java -jar yourapp.jar and it'll work, picking up the dependencies. You can now transfer these files individually with rsync; yourapp.jar will be much smaller, and your dependency jars will usually not have changed, so those won't take much time when rsyncing either.

I'm aware this doesn't directly answer the actual asked question, but I bet in 90%+ of the times this question comes up, not fatjarring is the appropriate answer.

NB: Ant, Maven, Guava, etc, can take care of putting the right manifest entry in. If the intent of your jar is not to run it, but, for example, it's a war for a web servlet container, those have their own rules for how to specify where your dependency jars live.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
1

I replaced my original configuration code in build.gradle:

shadowJar {
    zip64 true
    entryCompression = org.gradle.api.tasks.bundling.ZipEntryCompression.STORED
    exclude 'META-INF/*.RSA', 'META-INF/*.SF','META-INF/*.DSA'
    manifest {
        attributes 'Main-Class': 'com.my.project.Main'
    }
}

with

jar {
    manifest {
        attributes(
                'Main-Class': 'com.my.project.Main',
        )
    }
}

task fatJar(type: Jar) {
    manifest.from jar.manifest
    classifier = 'all'
    from {
        configurations.runtime.collect { it.isDirectory() ? it : zipTree(it) }
    } {
        exclude "META-INF/*.SF"
        exclude "META-INF/*.DSA"
        exclude "META-INF/*.RSA"
    }
    with jar
}

(Using the solution posted here https://stackoverflow.com/a/31426413/99256)

The final fatJar is much larger (i.e. 56 MB) than what Shadow plugin produced for me (i.e. 35 MB). However, the final jar seems to be rsyncable (when I make a tiny change in my source code, rsync transfers only a very small amount of data).

Please note that I have very limited knowledge of Gradle so this is just my observation and it may be possible to improve it further on.

Community
  • 1
  • 1
MartyIX
  • 27,828
  • 29
  • 136
  • 207
  • It turned off compression which is what I did in my answer. See docs on this... org.gradle.api.tasks.bundling.ZipEntryCompression.STORED here https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/bundling/ZipEntryCompression.html#STORED – Harry Apr 14 '16 at 06:18
  • @Harry If you integrate my answer (a solution for gradle) to your answer (to make it a complete answer from my perspective), I'll gladly award you the bounty as I like your answer in general. – MartyIX Apr 14 '16 at 06:20
  • I've just updated it. Does that help. If you want any more changes to the answer just let me know. As a side note if you're interested in compression have a look at the hutter prize http://prize.hutter1.net/. I lost myself for several months messing around with compression when I found it. – Harry Apr 14 '16 at 06:20
  • It looks good, thanks! I'll have a look but unfortunately I don't have months of free time to invest. :) – MartyIX Apr 14 '16 at 06:28