I'm using xsbt-proguard-plugin, which is an SBT plugin for working with Proguard.
I'm trying to come up with a Proguard configuration for a Hive Deserializer I've written, which has the following dependencies:
// project/Dependencies.scala
val hadoop = "org.apache.hadoop" % "hadoop-core" % V.hadoop
val hive = "org.apache.hive" % "hive-common" % V.hive
val serde = "org.apache.hive" % "hive-serde" % V.hive
val httpClient = "org.apache.httpcomponents" % "httpclient" % V.http
val logging = "commons-logging" % "commons-logging" % V.logging
val specs2 = "org.specs2" %% "specs2" % V.specs2 % "test"
Plus an unmanaged dependency:
// lib/UserAgentUtils-1.6.jar
Because most of these are either for local unit testing or are available within a Hadoop/Hive environment anyway, I want my minified jarfile to only include:
- The Java classes SnowPlowEventDeserializer.class and SnowPlowEventStruct.class
org.apache.httpcomponents.httpclient
commons-logging
lib/UserAgentUtils-1.6.jar
But I'm really struggling to get the syntax right. Should I start from a whitelist of classes I want to keep, or explicitly filter out the Hadoop/Hive/Serde/Specs2 libraries? I'm aware of this SO question but it doesn't seem to apply here.
If I initially try the whitelist approach:
// Should be equivalent to sbt> package
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
proguardLibraryJars := Nil,
proguardOptions := Seq(
"-keepattributes *Annotation*,EnclosingMethod",
"-dontskipnonpubliclibraryclassmembers",
"-dontoptimize",
"-dontshrink",
"-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventDeserializer",
"-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventStruct"
)
)
Then I get a Hadoop processing error, so clearly Proguard is still trying to bundle Hadoop:
proguard: java.lang.IllegalArgumentException: Can't find common super class of [[Lorg/apache/hadoop/fs/FileStatus;] and [[Lorg/apache/hadoop/fs/s3/Block;]
Meanwhile if I try Proguard's filtering syntax to build up the blacklist of libraries I don't want to include:
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
proguardLibraryJars := Nil,
proguardOptions := Seq(
"-keepattributes *Annotation*,EnclosingMethod",
"-dontskipnonpubliclibraryclassmembers",
"-dontoptimize",
"-dontshrink",
"-injars !*hadoop*.jar"
)
)
Then this doesn't seem to work either:
proguard: java.io.IOException: Can't read [/home/dev/snowplow-log-deserializers/!*hadoop*.jar] (No such file or directory)
Any help greatly appreciated!