Problems of running spark for Python

Question

two questions:

how to run python3 in spark module? I run /bin/.pyspark and it automatically runs Python 2.7. How to run Python3?
After I run pyspark, it pops a warning like this: 16/12/29 17:33:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Does it mean I downloaded the wrong spark platform?

I am using MacProBook. Thanks.

score 3 · Answer 1 · edited May 23 '17 at 12:10

Follow these steps for:

1 time:

PYSPARK_PYTHON=python3 ./bin/pyspark

Everytime:

>>>cd
>>>vim .bashrc

Add these 2 lines at the end of file and save the file.

export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_DRIVER_PYTHON=python3

After exiting from the file, source the .bashrc file to reflect changes.

>>>source .bashrc

Now when you start spark, it will use Python3.

Read this for your 2nd error. It has got to do with 32bit vs 64bit source code compilation:

score 0 · Answer 2 · answered Mar 24 '17 at 16:49

add this in your ~/.bashrc `

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/"

export HADOOP_COMMON_LIB_NATIVE_DIR="/usr/local/hadoop/lib/native/"

or : export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/native"

2 Answers2