0

The environment is: JDK 1.7; CDH 5.8.0

The code is

from pyspark.ml.feature import PCA
from pyspark.mllib.linalg import Vectors
data = [(Vectors.sparse(5, [(1, 1.0), (3, 7.0)]),),
    (Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0]),),
    (Vectors.dense([4.0, 0.0, 0.0, 6.0, 7.0]),)]
df = sqlContext.createDataFrame(data,["features"])
pca = PCA(k=2, inputCol="features", outputCol="pca_features")
model = pca.fit(df)

A graph helps to describe enter image description here

The error stack is

[Stage 2:>                                                          (0 + 1) / 2]/usr/java/jdk1.7.0_67-cloudera/bin/java: symbol lookup error: /tmp/jniloader73074               80764352992550netlib-native_system-linux-x86_64.so: undefined symbol: cblas_daxpy
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 47504)
Traceback (most recent call last):
  File "/usr/lib64/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib64/python2.7/SocketServer.py", line 649, in __init__
    self.handle()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/accumulators.py", line 235, in handle
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/pipeline.py", line 69, in fit
    num_updates = read_int(self.rfile)
      File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/serializers.py", line 545, in read_int
return self._fit(dataset)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/wrapper.py", line 133, in _fit
    java_model = self._fit_java(dataset)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/wrapper.py", line 130, in _fit_java
    return self._java_obj.fit(dataset._jdf)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__
    raise EOFError
EOFError
----------------------------------------
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 631, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server
>>> ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/context.py", line 224, in signal_handler
    self.cancelAllJobs()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/context.py", line 909, in cancelAllJobs
    self._jsc.sc().cancelAllJobs()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server

The things about this issue Python Spark Context can't connect to the Py4J Spark Context because of the Py4J java server down which is caused by

symbol lookup error: /tmp/jniloader73074               80764352992550netlib-native_system-linux-x86_64.so: undefined symbol: cblas_daxpy

So, the python Spark Context can't connect to Py4J Spark context which shows Py4J Spark context ('127.0.0.1', 47504) Connection refused

Another proof is in the executor log, it shows

 CoarseGrainedExecutorBackend: An unknown (executor_IP:executor_port) driver disconnected
CoarseGrainedExecutorBackend: Driver (executor_IP:executor_port) disassociated! Shutting down

It means the executor can't connect to the Py4J Spark context as well.

yarn logs -applicationId application_xxxxxxxxx_xxxxxx

Container: container_e37_1484199111776_8460_01_000001 on node_xxxxx
LogType:stderr
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:94
Log Contents:
17/02/20 11:18:05 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.

LogType:stdout
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:0
Log Contents:

Container: container_e37_1484199111776_8460_01_000002 on node_xxxxx_2
LogType:stderr
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:250
Log Contents:
17/02/20 11:18:06 WARN executor.CoarseGrainedExecutorBackend: An unknown (driver IP:PORT) driver disconnected

LogType:stdout
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:0
Log Contents:

Any idea why?

cdhit
  • 1,384
  • 1
  • 15
  • 38

1 Answers1

1

It looks like the source problem of the problem is incorrect packaging of the native libraries. The problem is documented in the netlib issue tracker: https://github.com/fommil/netlib-java/issues/66

The recommended solution is to:

Try OpenBLAS or Intel's Math Kernel Library.