1

I have a spark job running on AWS EMR cluster, it need access native lib(*.so), per spark's document (https://spark.apache.org/docs/2.3.0/configuration.html) I need add "spark.driver.extraLibraryPath" and "spark.executor.extraLibraryPath" options in spark-submit command line

spark-submit \
--class test.Clustering \
--conf spark.executor.extraLibraryPath="/opt/test/lib/native" \
--conf spark.driver.extraLibraryPath="/opt/test/lib/native" \
--master yarn \
--deploy-mode client \
s3-etl-prepare-1.0-SNAPSHOT-jar-with-dependencies.jar "$@"

It works as I expected, native lib is loaded, the problem is: during spark job I need doing a distribute lzo indexer MR job which need lzo native library, the lzo code could not load the native gpl library:

21/06/16 09:49:09 ERROR GPLNativeCodeLoader: Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1124)
        at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
        at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
        at com.hadoop.compression.lzo.DistributedLzoIndexer.<init>(DistributedLzoIndexer.java:28)
        at test.misc.FileHelper.distributIndexLzoFile(FileHelper.scala:260)
        at test.scalaapp.Clustering$.main(Clustering.scala:66)
        at test.scalaapp.Clustering.main(Clustering.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

it seems "spark.driver.extraLibraryPath" option override or change the whole library path rather than append a new one, how can I keep both gpl lzo native path and my own library path?

0 Answers0