0

Hello everybody, we have a kerberized HDP (Hortonworks) cluster, we can run Spark jobs from Spark-Submit (CLI), Talend Big Data, but not from Eclipse.

We have a Windows client machine where Eclipse is installed and MIT windows Kerberos Client is confgiured (TGT Configuration). The goal is to run Spark job using eclipse. Portion of the java code related with Spark is operational and tested via CLI. Below is mentioned part of the code for the job.

private void setConfigurationProperties()
    {
        try{
        sConfig.setAppName("abcd-name");
        sConfig.setMaster("yarn-client");
        sConfig.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
        sConfig.set("spark.hadoop.yarn.resourcemanager.address", "rs.abcd.com:8032");           sConfig.set("spark.hadoop.yarn.resourcemanager.scheduler.address","rs.abcd.com:8030");
        sConfig.set("spark.hadoop.mapreduce.jobhistory.address","rs.abcd.com:10020");
        sConfig.set("spark.hadoop.yarn.app.mapreduce.am.staging-dir", "/dir");
        sConfig.set("spark.executor.memory", "2g");
        sConfig.set("spark.executor.cores", "4");
        sConfig.set("spark.executor.instances", "24");
        sConfig.set("spark.yarn.am.cores", "24");
        sConfig.set("spark.yarn.am.memory", "16g");
        sConfig.set("spark.eventLog.enabled", "true");
        sConfig.set("spark.eventLog.dir", "hdfs:///spark-history");
        sConfig.set("spark.shuffle.memoryFraction", "0.4");
        sConfig.set("spark.hadoop." + "mapreduce.application.framework.path","/hdp/apps/version/mapreduce/mapreduce.tar.gz#mr-framework");
        sConfig.set("spark.local.dir", "/tmp");
        sConfig.set("spark.hadoop.yarn.resourcemanager.principal",  "rm/_HOST@ABCD.COM");
        sConfig.set("spark.hadoop.mapreduce.jobhistory.principal",  "jhs/_HOST@ABCD.COM");
        sConfig.set("spark.hadoop.dfs.namenode.kerberos.principal", "nn/_HOST@ABCD.COM");
        sConfig.set("spark.hadoop.fs.defaultFS", "hdfs://hdfs.abcd.com:8020");
        sConfig.set("spark.hadoop.dfs.client.use.datanode.hostname", "true");       }
}

When we run the code the following error pops up:

17/04/05 23:37:06 INFO Remoting: Starting remoting

17/04/05 23:37:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@1.1.1.1:54356]

17/04/05 23:37:06 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 54356.

17/04/05 23:37:06 INFO SparkEnv: Registering MapOutputTracker 17/04/05 23:37:06 INFO SparkEnv: Registering BlockManagerMaster

17/04/05 23:37:06 INFO DiskBlockManager: Created local directory at C:\tmp\blockmgr-baee2441-1977-4410-b52f-4275ff35d6c1

17/04/05 23:37:06 INFO MemoryStore: MemoryStore started with capacity 2.4 GB

17/04/05 23:37:06 INFO SparkEnv: Registering OutputCommitCoordinator

17/04/05 23:37:07 INFO Utils: Successfully started service 'SparkUI' on port 4040.

17/04/05 23:37:07 INFO SparkUI: Started SparkUI at http://1.1.1.1:4040

17/04/05 23:37:07 INFO RMProxy: Connecting to ResourceManager at rs.abcd.com/1.1.1.1:8032

17/04/05 23:37:07 ERROR SparkContext: Error initializing SparkContext.

org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

17/04/05 23:37:07 INFO SparkUI: Stopped Spark web UI at http://1.1.1.1:4040

Please guide us how to specify in java code Kerberos authentication method instead of SIMPLE. Or how to instruct the client for Kerberos authentication request. And whole what should the process look like and what would be the right approach

Thank you

Community
  • 1
  • 1
nop01nt
  • 1
  • 1
  • 1
    _"SIMPLE authentication is not enabled"_ >> just google that error message, and you will learn that you forgot **some critical Hadoop configuration properties** -- the kind of conf that is available on your Edge Node when you run `spark-submit`, in `/etc/hadoop/conf/core-site.xml`. To begin with, the Hadoop client has no information about the authorization type expected by the server, hence the attempt with SIMPLE by default. – Samson Scharfrichter Apr 06 '17 at 00:11
  • When you provide the expected Hadoop conf, you will then fail because of the required "native libraries" for Kerberos Hadoop implementation on client side, for which there is no official build on Windows >> google that, too. I think I did some answers on that line on SO in the past. – Samson Scharfrichter Apr 06 '17 at 00:14
  • @ Samson Scharfrichter - First of all thank you for reply, but let me clarify some details: 1) **some critical Hadoop configuration properties** - I know that the remote windows client where Eclipse is running is missing some critical configuration, but I could not find out what configuration that is nor how/where to specify them. Please provide more information, because I could not find them on the web. 2) If I get it right, after the first step, we would need to copy on client windows machine **"native libraries"** for Kerberos (no official build). Please provide the links/information – nop01nt Apr 06 '17 at 07:25
  • Have a look at http://stackoverflow.com/questions/42650562/access-a-secured-hive-when-running-spark-in-an-unsecured-yarn-cluster/42651609#42651609 and https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html – Samson Scharfrichter Apr 07 '17 at 09:02
  • And, of course, the Hadoop documentation about default values for `core-site.xml` properties https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml >> check `hadoop.security.authentication` among other things. – Samson Scharfrichter Apr 07 '17 at 09:04

0 Answers0