I'm developing Apache Spark application on Scala 2.11 using SBT 1.3.10. I use IDE on my local machine without having Spark/Hadoop/Hive installed, but rather added them as SBT dependencies (Hadoop 3.1.2, Spark 2.4.5, Hive 3.1.2). My SBT is below:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "2.4.5",
"org.apache.hadoop" % "hadoop-client" % "3.1.2",
"com.fasterxml.jackson.core" % "jackson-core" % "2.9.10",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.9.10",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.9.10",
// about these two later in the question
"org.apache.hive" % "hive-exec" % "3.1.2",
"org.apache.commons" % "commons-lang3" % "3.6"
)
In my application I'm reading a sample CSV file into DataFrame with provided schema:
val init = spark.read
.format("csv")
.option("header", value = false)
.schema(sampleCsvSchema)
.load("src/main/resources/sample.csv")
init.show(10, false)
At some moment I had to add org.apache.hive:hive-exec:3.1.2
dependency and got an exception during execution:
Illegal pattern component: XXX
java.lang.IllegalArgumentException: Illegal pattern component: XXX
at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)
at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)
at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)
at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
at org.apache.spark.sql.execution.datasources.csv.CSVOptions.<init>(CSVOptions.scala:139)
at org.apache.spark.sql.execution.datasources.csv.CSVOptions.<init>(CSVOptions.scala:41)
...
It says that org.apache.commons.lang3.time.FastDatePrinter.parsePattern()
cannot parse spark timestamp format (org.apache.spark.sql.execution.datasources.csv.CSVOptions.timestampFormat
) which is by default set to "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
. (Please, note that my sample.csv doesn't have any timestamp data, but anyway Spark goes through this stack of precedures).
Initially, org.apache.commons.lang3.time.FastDatePrinter
was added to project by org.apache.commons:commons-lang3:3.6
dependency and worked fine. However, org.apache.hive:hive-exec:3.1.2
library has added its own implementation of specified package and class, which cannot parse "XXX"
(and it cannot be excluded, as it is implemented inside library itself).
So I have a situation where 2 library dependencies which provide 2 realizations of the same package, and I need to chose a specific one of them during app execution. How this can be done?
P.S. I've found a workaround for this specific "java.lang.IllegalArgumentException: Illegal pattern component: XXX" issue, but I'm more interested in how to resolve such SBT dependencies issues in general.