1

I'm trying to execute a working code out of my eclipse IDE and I'm facing strange errors I can't deal with. Trying to sum up my problems :

  • Executing my code with eclipse : Everything's fine.
  • Capturing the command line thrown by eclipse to run my app, and copying it into a shell : Everything's fine.

Now, the command line generated by eclipse to run my app is something like java -cp lots-of-jars -Dvm.params myPackage.MyMainClass app-params.

My objective is to execute my App with Oozie as a Java Action, so I need to build an uber jar to reduce lots-of-jars to myapp.jar.

To do so, I configured the maven shade plugin like this :

         <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.4.2</version>
            <configuration>
            </configuration>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer
                                implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                <resource>reference.conf</resource>
                            </transformer>
                            <transformer
                                implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>es.mycompany.bigdata.OozieAction</mainClass>
                            </transformer>


<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
<transformer implementation="org.apache.maven.plugins.shade.resource.PluginXmlResourceTransformer" />
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>

I had to add some transformers because of some errors I was facing starting my app (can't create FsShell Spring object, can't start SparkContext..) By the way, my app purpose is to download some azure blobs, put then into HDFS, transform them with Spark to finally add then to a Hive table. I developed the app in Java (including the spark part) and used Spring to do so.

Now, my last new problem occurs when I'm trying to create a HiveContext (my spark context is ok, as my app works if I omit the hive part) :

@Bean
@Lazy
@Scope("singleton")
public SQLContext getSQLContext(@Autowired JavaSparkContext sparkContext) {
    return new HiveContext(sparkContext);
}

The error thrown is :

2017-04-02 20:20:18 WARN  Persistence:106 - Error creating validator of type org.datanucleus.properties.CorePropertyValidator
ClassLoaderResolver for class "" gave error on creation : {1}
org.datanucleus.exceptions.NucleusUserException: ClassLoaderResolver for class "" gave error on creation : {1}
...
Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:283)
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:247)
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:225)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.<init>(JDOPersistenceManagerFactory.java:416)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:301)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
        ... 93 more
2017-04-02 20:20:18 WARN  ExtendedAnnotationApplicationContext:550 - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'getOozieJavaAction': Unsatisfied dependency expressed through field 'sqlContext'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'getSQLContext' defined in es.mediaset.technology.bigdata.config.FlatJsonToCsvAppConfig: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.apache.spark.sql.SQLContext]: Factory method 'getSQLContext' threw exception; nested exception is java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

As my code correctly runs in eclipse, and out of eclipse with such a command

/usr/java/jdk1.8.0_121/bin/java -Dmode=responsive -Dspark.master=local[*] -Dfile.encoding=UTF-8 -classpath /home/cloudera/workspace-sts/oozie-eventhub-retriever/target/classes:/home/cloudera/workspace-sts/java-framework/target/classes:/home/cloudera/.m2/repository/com/microsoft/azure/azure-storage/5.0.0/azure-storage-5.0.0.jar:<...>:/etc/hive/conf.dist es.mycompany.technology.bigdata.OozieAction json2hive

I suppose my shade configuration is wrong. But I can't understand why, and I can't see what's I'm doing wrong...

Thanks

Cheloute
  • 783
  • 2
  • 11
  • 27
  • and these "loads of jars" you are merging, include the datanucleus-XXX jars? If so, you merged the plugin.xml file(s) from those jars? and the MANIFEST files? – Neil Stockton Apr 03 '17 at 05:51
  • When I execute my app from my terminal with the eclipse command, my classpath includes datanucleus-XXX.jars. And if I open my uber jar, classes from datanucleus-XXX.jars are inside. I added the PluginXmlResourceTransformer as you suggested, but same result. My resulting Manifest file is : Manifest-Version: 1.0 Build-Jdk: 1.7.0_67 Built-By: cloudera Created-By: Apache Maven 3.3.9 Main-Class: es.mycompany.technology.bigdata.OozieAction Archiver-Version: Plexus Archiver – Cheloute Apr 03 '17 at 07:53
  • I've no idea what is a "PluginXmlResourceTransformer" or what it does, so who knows whether it has done the job correctly. If you really want to mung jars together suggest that you look at http://www.datanucleus.org/servlet/forum/viewthread_thread,8020_lastpage,yes#lastpost – Neil Stockton Apr 03 '17 at 07:57
  • Your link took me to this stackoverflow answer I missed when I looked for help : https://stackoverflow.com/questions/37484239/apache-spark-hive-executable-jar-with-maven-shade. It describes perfectly what I'm lookinf for. I'll try to reproduce this solution and if it works, I'll edit my question to point the solution. Thanks for your help! – Cheloute Apr 03 '17 at 08:19
  • Well, following the previous quoted answer, I adapted my maven-shade-plugin and am now facing the following error : org.datanucleus.exceptions.NucleusUserException: Error : Could not find API definition for name "JDO". Perhaps you dont have the requisite datanucleus-api-XXX jar in the CLASSPATH? Of course, datanucleus-api-jdo.jar is in my uber jar.. – Cheloute Apr 03 '17 at 08:53
  • Ok, I misunderstood the PluginXMLResourceTransformer function of the maven Shade plugin, or I don't know how to use it. The fact is my plugin.xml wasn't merged. – Cheloute Apr 03 '17 at 10:07
  • Possible duplicate of [Apache spark Hive, executable JAR with maven shade](http://stackoverflow.com/questions/37484239/apache-spark-hive-executable-jar-with-maven-shade) – Neil Stockton Apr 03 '17 at 12:45

1 Answers1

0

The following Stackoverflow Q/A answers this question : Apache spark Hive, executable JAR with maven shade

For those who don't understand how to "merge" all the plugin.xml files from datanucleus, you can take this one : plugin.xml and paste it in your resource folder.

Cheloute
  • 783
  • 2
  • 11
  • 27