6

I am a spark noob, and using windows 10, trying to get spark to work. I have set the environment variables correctly, and I also have winutils. When I go into spark/bin, and type spark-shell, it runs spark but it gives the following errors.

Also it doesn't show the spark context or spark session.

C:\Users\Akshay\Downloads\spark\bin>spark-shell
    17/06/19 23:45:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/06/19 23:45:19 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Users/Akshay/Downloads/spark/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Users/Akshay/Downloads/spark/jars/datanucleus-api-jdo-3.2.6.jar."
    17/06/19 23:45:20 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Users/Akshay/Downloads/spark/bin/../jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Users/Akshay/Downloads/spark/jars/datanucleus-core-3.2.10.jar."
    17/06/19 23:45:20 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Users/Akshay/Downloads/spark/bin/../jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Users/Akshay/Downloads/spark/jars/datanucleus-rdbms-3.2.9.jar."
    java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
      at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
      at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
      at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
      at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
      at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
      at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
      at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
      ... 47 elided
    Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
      at java.lang.reflect.Constructor.newInstance(Unknown Source)
      at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
      ... 58 more
    Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
      at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
      at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
      at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
      at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
      at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
      at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
      at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
      ... 63 more
    Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
      at java.lang.reflect.Constructor.newInstance(Unknown Source)
      at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
      ... 71 more
    Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
      at java.lang.reflect.Constructor.newInstance(Unknown Source)
      at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
      at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
      at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
      at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
      ... 76 more
    Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
      at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
      at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
      ... 84 more
    Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
      at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
      at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
      at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
      ... 85 more
    <console>:14: error: not found: value spark
           import spark.implicits._
                  ^
    <console>:14: error: not found: value spark
           import spark.sql
                  ^
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
          /_/

    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
    Type in expressions to have them evaluated.
    Type :help for more information.

    scala>    

How do I resolve this?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Akshay
  • 63
  • 1
  • 1
  • 3
  • Possible duplicate of [The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- (on Windows)](https://stackoverflow.com/questions/34196302/the-root-scratch-dir-tmp-hive-on-hdfs-should-be-writable-current-permissions) – KARTHIKEYAN.A Sep 05 '17 at 04:17

3 Answers3

8

As per the error message:

Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
  at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  ... 85 more

You should change the permissions of C:\tmp\hive directory as follows:

winutils.exe chmod -R 777 C:\tmp\hive
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
  • 3
    it really does matter whether you use the 64-bit or 32-bit version of winutils. I struggled with this until I realized I wasn't using the right winutils.exe. – wrschneider Jul 13 '17 at 15:14
  • @wrschneider Correct! I've been seeing it in my Spark workshops few times already to come up with the same conclusion. Is there a way to find what bit version Windows is? What worked for you? Where did you find the proper `winutils.exe`? Share the news. Thanks. – Jacek Laskowski Jul 13 '17 at 23:54
  • 3
    @JacekLaskowski I got it from here - https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin it solved the issue for me. – vpekar Sep 05 '17 at 14:11
  • 1
    For me the Scala shell (i.e. spark-shell command) works fine without throwing any error, but pyspark throws the error, though `/tmp/hive` has 777 permission! – arun Jan 12 '18 at 18:34
  • why spark community doesn't fix it , this issue is still there with spark i am using 2.3.4 – dev Jan 23 '20 at 17:08
  • @etl_devs You're a member of the Spark community so...when are you going to submit a pull request? ;-) – Jacek Laskowski Jan 23 '20 at 19:18
  • @etl_devs It **is** a problem with Hadoop that requires a POSIX-compliant file system which NTFS (or whatever Windows 10 uses) is not. Sorry. Spark simply uses Hadoop for file-system access. – Jacek Laskowski Jan 24 '20 at 08:22
  • 3
    @JacekLaskowski True , that's why I say windows sucks , can't buy mac ( am poor ) but I fixed the issue with spark.I was getting it even after using winutils but for me the issue was that my user does not have access to c: drive and I think 99% or people are getting it because of access issue with windows user. Also if are using laptop provided by company go company network and then change the permission of c:/tmp for winutil and your spark will start working. Don't restart pc if you can't access your organization network and you are good. – dev Jan 25 '20 at 06:49
  • the only problem I can't get around is metastore_db, I have delete the folder every time I have to load a new file in spark dataframe , any idea ? – dev Jan 25 '20 at 06:52
1

This means that the hive root scratch directory doesn't have the required permission.

Follow below steps for resolution.

  1. First change the permission using below command.

    $>C:\Users\xxxx\winutils\winutils\bin\winutils.exe chmod 777 .\hive\

  2. if the error exists even after this, check if the trust relationship between your workstation and the primary domain using below command.

    $>C:\Users\xxxx\winutils\winutils\bin\winutils.exe ls .\hive\

if this gives "FindFileOwnerAndPermission error (1789): The trust relationship between this workstation and the primary domain failed.", it means that your computer domain controller is not reachable , possible reason could be you are not on same VPN as your system domain controller.Connect to VPN and try again.

Azam Khan
  • 516
  • 5
  • 12
-1

upgrade Spark version 2.4 , issue is resolved in new version

vaquar khan
  • 10,864
  • 5
  • 72
  • 96