Hello everybody, we have a kerberized HDP (Hortonworks) cluster, we can run Spark jobs from Spark-Submit (CLI), Talend Big Data, but not from Eclipse.
We have a Windows client machine where Eclipse is installed and MIT windows Kerberos Client is confgiured (TGT Configuration). The goal is to run Spark job using eclipse. Portion of the java code related with Spark is operational and tested via CLI. Below is mentioned part of the code for the job.
private void setConfigurationProperties()
{
try{
sConfig.setAppName("abcd-name");
sConfig.setMaster("yarn-client");
sConfig.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
sConfig.set("spark.hadoop.yarn.resourcemanager.address", "rs.abcd.com:8032"); sConfig.set("spark.hadoop.yarn.resourcemanager.scheduler.address","rs.abcd.com:8030");
sConfig.set("spark.hadoop.mapreduce.jobhistory.address","rs.abcd.com:10020");
sConfig.set("spark.hadoop.yarn.app.mapreduce.am.staging-dir", "/dir");
sConfig.set("spark.executor.memory", "2g");
sConfig.set("spark.executor.cores", "4");
sConfig.set("spark.executor.instances", "24");
sConfig.set("spark.yarn.am.cores", "24");
sConfig.set("spark.yarn.am.memory", "16g");
sConfig.set("spark.eventLog.enabled", "true");
sConfig.set("spark.eventLog.dir", "hdfs:///spark-history");
sConfig.set("spark.shuffle.memoryFraction", "0.4");
sConfig.set("spark.hadoop." + "mapreduce.application.framework.path","/hdp/apps/version/mapreduce/mapreduce.tar.gz#mr-framework");
sConfig.set("spark.local.dir", "/tmp");
sConfig.set("spark.hadoop.yarn.resourcemanager.principal", "rm/_HOST@ABCD.COM");
sConfig.set("spark.hadoop.mapreduce.jobhistory.principal", "jhs/_HOST@ABCD.COM");
sConfig.set("spark.hadoop.dfs.namenode.kerberos.principal", "nn/_HOST@ABCD.COM");
sConfig.set("spark.hadoop.fs.defaultFS", "hdfs://hdfs.abcd.com:8020");
sConfig.set("spark.hadoop.dfs.client.use.datanode.hostname", "true"); }
}
When we run the code the following error pops up:
17/04/05 23:37:06 INFO Remoting: Starting remoting
17/04/05 23:37:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@1.1.1.1:54356]
17/04/05 23:37:06 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 54356.
17/04/05 23:37:06 INFO SparkEnv: Registering MapOutputTracker 17/04/05 23:37:06 INFO SparkEnv: Registering BlockManagerMaster
17/04/05 23:37:06 INFO DiskBlockManager: Created local directory at C:\tmp\blockmgr-baee2441-1977-4410-b52f-4275ff35d6c1
17/04/05 23:37:06 INFO MemoryStore: MemoryStore started with capacity 2.4 GB
17/04/05 23:37:06 INFO SparkEnv: Registering OutputCommitCoordinator
17/04/05 23:37:07 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/04/05 23:37:07 INFO SparkUI: Started SparkUI at http://1.1.1.1:4040
17/04/05 23:37:07 INFO RMProxy: Connecting to ResourceManager at rs.abcd.com/1.1.1.1:8032
17/04/05 23:37:07 ERROR SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
17/04/05 23:37:07 INFO SparkUI: Stopped Spark web UI at http://1.1.1.1:4040
Please guide us how to specify in java code Kerberos authentication method instead of SIMPLE. Or how to instruct the client for Kerberos authentication request. And whole what should the process look like and what would be the right approach
Thank you