I am writing my first test.py at spark.
Code
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My Test")
sc = SparkContext(conf = conf)
lines = sc.textFile("file:///home/hduser/spark-1.5.2-bin-hadoop2.6/README.md") # Create an RDD called lines
lines.count()
lines.first()
Output:
hduser@borischow-VirtualBox:~/spark-1.5.2-bin-hadoop2.6$ bin/spark-submit test.py SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hduser/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/12/28 17:42:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ****15/12/28 17:42:46 WARN Utils: Your hostname, borischow-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0) 15/12/28 17:42:46 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 15/12/28 17:42:48 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 15/12/28 17:42:48 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.**** hduser@borischow-VirtualBox:~/spark-1.5.2-bin-hadoop2.6$
Questions:
I cannot generate the expected output from lines.count() & lines.first(). Why?
What are the reasons behind of the warning messages?
15/12/28 17:42:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/28 17:42:46 WARN Utils: Your hostname, borischow-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
15/12/28 17:42:46 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to >another address
15/12/28 17:42:48 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
15/12/28 17:42:48 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Thanks a lot!