0

I know there are many questions like this but i tried all solutions trust me. And i keep getting the same error again and again. I am trying to access spark of remote clusters and running localy by using data-bricks connect and conda env and the IDE i use is Pycharm.

I am running the env either in anaconda prompt or in the built in terminal of pycharm. Both return this error:

 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:382)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:397)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:390)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
        at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:274)
        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:262)
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:807)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
        at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2693)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2693)
        at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
        at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:368)
        at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:368)
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/12/23 22:35:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/12/23 22:35:59 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
View job details at https://"databricks-name".cloud.databricks.com/?o=0#/setting/clusters/0-535-sh256/sparkUi
* Simple PySpark test passed
* Testing dbutils.fs

I saw all over the internt ppl say to download winutils and set hadoop_home var to point at it. I tried it plenty times with all variations i saw. Nothing works i keep getting this error.

Also its strange because i thought winutils is only needed for local spark. I dont need spark locally as i am trying to connect to it via db connect. Any one can help me? I am stuck on it a few days already thx

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Boyuis
  • 1
  • 4
  • Does this answer your question? [Failed to locate the winutils binary in the hadoop binary path](https://stackoverflow.com/questions/19620642/failed-to-locate-the-winutils-binary-in-the-hadoop-binary-path) – Matt Andruff Dec 24 '21 at 14:38
  • Certainly as sepeicfied in the question I saw this question and many similar. I wrote explicitly I tried downloading winutils and putting it inside bin either of hadoop or in a separate winutils file. Both didn't help – Boyuis Dec 24 '21 at 22:22
  • Also is winutils needed when spark is running in remote cluster? But any case I tried it – Boyuis Dec 24 '21 at 22:27
  • It is needed, to evaluate your local user name who will be submitting the application to the cluster....You should be focusing on why `null` is part of the file path in the error – OneCricketeer Dec 31 '21 at 08:09
  • Thx for your help i solved it. But i run into another error in windows i posted it here https://stackoverflow.com/questions/70552946/error-in-pycharm-error-sparkcontext-failed-to-add-dependencies-jar thx for the help – Boyuis Jan 02 '22 at 01:04

0 Answers0