Getting error while setting pyspark environment

Question

C:\spark-3.0.0-preview2-bin-hadoop2.7\bin>pyspark
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
20/05/18 10:55:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.0.0-preview2
      /_/

Using Python version 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020 22:20:19)
SparkSession available as 'spark'.
>>> 20/05/18 10:55:56 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped

This looks similar: https://stackoverflow.com/questions/60257377/encountering-warn-procfsmetricsgetter-exception-when-trying-to-compute-pagesi — jgreve, May 18 '20 at 05:49
I tried that too, I added my Spark/python path to the environment but still i face the same issue. "ProcessTree metrics is stopped" — dhanush narayanan, May 25 '20 at 20:42
Hmm... what do you get with echo %PYTHONPATH% ? (hoping I'll see smth obvious) — jgreve, May 27 '20 at 06:31
Does this answer your question? [Encountering " WARN ProcfsMetricsGetter: Exception when trying to compute pagesize" error when running Spark](https://stackoverflow.com/questions/60257377/encountering-warn-procfsmetricsgetter-exception-when-trying-to-compute-pagesi) — dannylee8, Nov 16 '20 at 04:52

jgreve · Answer 1 · 2020-09-23T01:43:41.927

Following up to my comment, I wanted to see the current value of your PYTHONPATH and PATH environment variables. Here's an example from my machine. Just post the text in red, in other words the entire output of the echo command:

edit: disclaimer - I haven't run spark at all, just interested in it and am sifting through the documentation because I plan to learn it at some point.

PATH env-var

    C:\Program Files (x86)\Common Files\Oracle\Java\javapath
    C:\Windows\system32
    C:\Windows
    C:\Windows\System32\Wbem
    C:\Windows\System32\WindowsPowerShell\v1.0\
    C:\Windows\System32\OpenSSH\
    C:\Program Files\jdk-14.0.1_windows-x64_bin\jdk-14.0.1
    C:\Hadoop\bin
    C:\Users\lenovo\AppData\Local\Programs\Python\Python38-32\Scripts\
(i) C:\spark-3.0.0-preview2-bin-hadoop2.7\python\lib\py4j-0.10.8.1-src
    C:\Users\lenovo\AppData\Local\Programs\Python\Python38-32\Scripts\
    C:\Users\lenovo\AppData\Local\Programs\Python\Python38-32\
    C:\Users\lenovo\AppData\Local\Microsoft\WindowsApps

Ok, so unpacking your PATH env-var (above) we see you're using "spark-3.0.0-preview2" (i). Hmm... from the Spark download page (emphasis added): "Preview releases are not meant to be functional, i.e. they can and highly likely will contain critical bugs or documentation errors. The latest preview release is Spark 3.0.0-preview2, published on Dec 23, 2019." (edit: fwiw, I think it is a bit unfair that the download-page's drop list defaults to a preview version so I can understand why you chose that)

Personally I would look at using a different version of Spark (by different I mean older and more stable). Also, if you know linux at all I'd suggest doing this in a virtual machine. Maybe try this article on Medium: "Practical Apache Spark in 10 minutes."

But if you really have to use Windows I'd look for an older (and more stable) version of Spark.

Suggested PYTHONPATH

From that other issue I mentioned, they talk about pointing your PYTHONPATH env-var to SPARK_HOME as follows:

Adding PYTHONPATH environment variable with value as:
(ii) set PYTHONPATH=%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH%

Let's split out those directories for PYTHONPATH from (ii) we get:

(iii) %SPARK_HOME%\python
 (iv) %SPARK_HOME%\python\lib\py4j-<version>-src.zip
  (v) %PYTHONPATH%

Since we're on Windows it looks like the ':%PYTHONPATH%' at the end of (ii) is a typo (see between (iii) and (iv) above). Colons on unix/linux, semicolons on windows.

Anyway, if I had to make this work I'd start by looking to see if the folders (iii) and (iv) actually exist. That means starting with the value of your SPARK_HOME env-var.

Note the part of (iv), it won't literally be "". For example, if you do go ahead with the preview release that might be something like "3.0.0-preview2". (fwiw, this is also why I was also asking for the value of your PYTHONPATH env-var).

C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\jdk-14.0.1_windows-x64_bin\jdk-14.0.1;C:\Hadoop\bin;C:\Users\lenovo\AppData\Local\Programs\Python\Python38-32\Scripts\;C:\spark-3.0.0-preview2-bin-hadoop2.7\python\lib\py4j-0.10.8.1-src;C:\Users\lenovo\AppData\Local\Programs\Python\Python38-32\Scripts\;C:\Users\lenovo\AppData\Local\Programs\Python\Python38-32\;C:\Users\lenovo\AppData\Local\Microsoft\WindowsApps; — dhanush narayanan, Jun 17 '20 at 02:03
I think that `:` is just a typo and should be a `;` so that it just adds the two paths stated to the exising value of %PYTHONPATH% — Luigi Plinge, Jul 19 '20 at 23:07
I'm using the non-preview 3.0.0 and still getting the same error. — Luigi Plinge, Jul 19 '20 at 23:08

Getting error while setting pyspark environment

1 Answers1

PATH env-var

Suggested PYTHONPATH