0

I have a Spark machine running standalone mode. It has Spark job that is writing to kerberized HDFS.

Based on the Cloudera documentation Standalone Spark can't connect to kerberized HDFS. is this true? https://www.cloudera.com/documentation/enterprise/5-5-x/topics/sg_spark_auth.html

My Spark node is not kerberized. Do i need to go for YARN mode to write to kerberized HDFS? Does my Spark cluster also needs to be kerberized to connect to HDFS?

I have posted this earlier, but none of these worked for me Kinit with Spark when connecting to Hive

tk421
  • 5,775
  • 6
  • 23
  • 34
AKC
  • 953
  • 4
  • 17
  • 46
  • Can you quote the sentence in the Cloudera documentation asserting that _"Standalone Spark can't connect to kerberized HDFS"_? And BTW what do you mean exactly by "standalone" Spark, to avoid any ambiguity? – Samson Scharfrichter Jul 06 '17 at 20:09
  • Important: If you want to enable Spark event logging on a Kerberos-enabled cluster, you will need to enable Kerberos authentication for Spark as well, since Spark's event logs are written to HDFS. You can use Spark on a Kerberos-enabled cluster only in the YARN mode, not in the Standalone mode. – AKC Jul 06 '17 at 20:14
  • _"My Spark node is not kerberized"_ > what do you mean exactly by "kerberized node" -- you did not copy a valid `/etc/krb5.conf` to that node ? you have Kerberos errors because of the system clock and/or DNS inconsistencies ? you have no privilege to (or don't know how to) install `kinit` but don't want to let Spark use the Java libraries with `--principal` and `--keytab`? – Samson Scharfrichter Jul 06 '17 at 20:17
  • that link was valid for CDH **5.5** and an old version of Spark. Edit the URL to point to CDH **5.10** and look for any reference to Standalone mode... – Samson Scharfrichter Jul 06 '17 at 20:23
  • Please look closely at the final comment of Steve Loughran (HortonWorks) to https://stackoverflow.com/questions/42650562/access-a-secured-hive-when-running-spark-in-an-unsecured-yarn-cluster that explains about solving edge cases with `spark.yarn.access.namenodes` (note that you *do* need to put the Hadoop client libs in the CLASSPATH, as well as the directory containing Hadoop conf files) – Samson Scharfrichter Jul 06 '17 at 20:26

0 Answers0