42

I'm a dummy on Ubuntu 16.04, desperately attempting to make Spark work. I've tried to fix my problem using the answers found here on stackoverflow but I couldn't resolve anything. Launching spark with the command ./spark-shell from bin folder I get this message

WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable".

I'm using Java version is

java version "1.8.0_101
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode).

Spark is the latest version: 2.0.1 with Hadoop 2. 7. I've also retried with an older package of Spark, the 1.6.2 with Hadoop 2.4 but I get the same result. I also tried to install Spark on Windows but it seems harder than doing it on Ubuntu.

I also tried to run some commands on Spark from my laptop: I can define an object, I can create an RDD and store it in cache and I can use function like .map(), but when I try to run the function .reduceByKey() I receive several strings of error messages.

May be it's the Hadoop library which is compiled for 32bits, while I'm on 64bit?

Thanks.

cane_mastino
  • 421
  • 1
  • 4
  • 4
  • 2
    you shouldn't need to use the native-hadoop libraries -- that is just a warning indicating things will run more slowly. It sounds like you have a different issue that is preventing you from doing a reduceByKey -- perhaps ask a different question with the resulting error messages you get – David Apr 13 '17 at 01:15

2 Answers2

40

Steps to fix:

  • download Hadoop binaries
  • unpack to directory of your choice
  • set HADOOP_HOME to point to that directory.
  • add $HADOOP_HOME/lib/native to LD_LIBRARY_PATH.
  • 1
    Go to /etc/profile.d directory and create a hadoop.sh file in there with export HADOOP_HOME=/opt/hadoop/hadoop export HIVE_HOME=/opt/hadoop/hive export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin After you save the file, make sure to chmod +x /etc/profile.d/hadoop.sh source /etc/profile.d/hadoop.sh – cane_mastino Oct 13 '16 at 14:08
  • Sorry I'm new to Stackoverflow and made a mistake submitting the comment before time. The thing I intended to do was to write the commands I should write to set HADOOP_HOME and add to LD_LIBRARY_PATH; so that You can correct me before doing damages. So: go to '/etc/profile.d directory' and create a **hadoop.sh** file in there with: 'export HADOOP_HOME=/opt/hadoop/hadoop' 'export HIVE_HOME=/opt/hadoop/hive' 'export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin' After saving the file: 'chmod +x /etc/profile.d/hadoop.sh source /etc/profile.d/hadoop.sh'. Thank You so much for help! – cane_mastino Oct 13 '16 at 14:14
  • 3
    It's best to use Spark config files for that by editing `conf/spark-env.sh` in your `SPARK_HOME`. –  Oct 13 '16 at 15:47
  • Tank Yo @LostInOverflow , so I should do: `cd /spark-2.0.1-bin-hadoop2.7/conf` `sudo nano spark-env.sh` and write inside it: `export HADOOP_HOME=/home/myname/hadoop-2.7.3` then `export PATH=$HADOOP_HOME/lib/native` is it right? Sorry for redundant querying – cane_mastino Oct 14 '16 at 12:42
  • If these are the paths. –  Oct 14 '16 at 16:51
  • @cane_mastino You don't need (and doing so has security risks) to set execution privileges on files under /etc/profile.d . Those are "sourced" (the same way you do) from /etc/profile and then nobody should need to execute these files. – Luis Vazquez Jul 18 '20 at 14:26
  • 2
    Why is this needed? Do we even need this to run Spark? – Serrano Sep 14 '22 at 14:55
4
  1. Download hadoop binary (link) and put it in your home directory (you can choose a different hadoop version if you like and change the next steps accordingly)
  2. Unzip the folder in your home directory using the following command. tar -zxvf hadoop_file_name
  3. Now add export HADOOP_HOME=~/hadoop-2.8.0 to your .bashrc file. Open a new terminal and try again.

Source: Install PySpark on ubuntu

  • 3
    You should not use .bashrc but the configuration file spark-env.sh for setting Spark's required environment variables. – Luis Vazquez Jul 18 '20 at 14:24