1

Using: OSX 10.11.2, SPARK version 1.5.2, Python version 2.7.10 , iPython 4.0.1

Opening SPARK from terminal in an iPython Notebook, I use this command:

IPYTHON=1 $SPARK_HOME/bin/pyspark

My goal is to parallelize a vector of integers, then apply the count, first, and take functions on that vector.

My workflow is the following (after spark is already opened within an IPython notebook):

rdd=sc.parallelize([1,2 ,3])
rdd.collect()

Then when I attempt to run the count function like so,

rdd.count()

I receive the following error: (Please note, I realize the source of the error is the illegal port number, however I could use help in solving this error by changing the port number used. My suspicion is that the error source originates from the interaction between iPython and Spark, but looking around, it seems like the developers already addressed this error.)

In [3]: rdd.count()
15/12/21 14:27:39 INFO SparkContext: Starting job: count at <ipython-input-3-a0443394e570>:1
15/12/21 14:27:39 INFO DAGScheduler: Got job 1 (count at <ipython-input-3-a0443394e570>:1) with 4 output partitions
15/12/21 14:27:39 INFO DAGScheduler: Final stage: ResultStage 1(count at <ipython-input-3-a0443394e570>:1)
15/12/21 14:27:39 INFO DAGScheduler: Parents of final stage: List()
15/12/21 14:27:39 INFO DAGScheduler: Missing parents: List()
15/12/21 14:27:39 INFO DAGScheduler: Submitting ResultStage 1 (PythonRDD[1] at count at <ipython-input-3-a0443394e570>:1), which has no missing parents
15/12/21 14:27:39 INFO MemoryStore: ensureFreeSpace(4200) called with curMem=2001, maxMem=555755765
15/12/21 14:27:39 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.1 KB, free 530.0 MB)
15/12/21 14:27:39 INFO MemoryStore: ensureFreeSpace(2713) called with curMem=6201, maxMem=555755765
15/12/21 14:27:39 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 530.0 MB)
15/12/21 14:27:39 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:64049 (size: 2.6 KB, free: 530.0 MB)
15/12/21 14:27:39 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861
15/12/21 14:27:39 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 1 (PythonRDD[1] at count at <ipython-input-3-a0443394e570>:1)
15/12/21 14:27:39 INFO TaskSchedulerImpl: Adding task set 1.0 with 4 tasks
15/12/21 14:27:39 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 4, localhost, PROCESS_LOCAL, 2071 bytes)
15/12/21 14:27:39 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 5, localhost, PROCESS_LOCAL, 2090 bytes)
15/12/21 14:27:39 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 6, localhost, PROCESS_LOCAL, 2090 bytes)
15/12/21 14:27:39 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 7, localhost, PROCESS_LOCAL, 2090 bytes)
15/12/21 14:27:39 INFO Executor: Running task 0.0 in stage 1.0 (TID 4)
15/12/21 14:27:39 INFO Executor: Running task 1.0 in stage 1.0 (TID 5)
15/12/21 14:27:39 INFO Executor: Running task 2.0 in stage 1.0 (TID 6)
15/12/21 14:27:39 INFO Executor: Running task 3.0 in stage 1.0 (TID 7)
15/12/21 14:27:39 ERROR Executor: Exception in task 2.0 in stage 1.0 (TID 6)
java.lang.IllegalArgumentException: port out of range:1315905645
    at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
    at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
    at java.net.Socket.<init>(Socket.java:244)
    at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
    at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/12/21 14:27:39 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 4)
java.lang.IllegalArgumentException: port out of range:1315905645
    at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
    at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
    at java.net.Socket.<init>(Socket.java:244)
    at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
    at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/12/21 14:27:39 ERROR Executor: Exception in task 3.0 in stage 1.0 (TID 7)
java.lang.IllegalArgumentException: port out of range:1315905645
    at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
    at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
    at java.net.Socket.<init>(Socket.java:244)
    at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
    at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/12/21 14:27:39 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 5)
java.lang.IllegalArgumentException: port out of range:1315905645
    at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
    at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
    at java.net.Socket.<init>(Socket.java:244)
    at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
    at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
odule named dateutil.tz
?615/12/21 14:27:39 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 4, localhost): java.lang.IllegalArgumentException: port out of range:1315905645
    at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
    at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
    at java.net.Socket.<init>(Socket.java:244)
    at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
    at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

15/12/21 14:27:39 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
15/12/21 14:27:39 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/12/21 14:27:39 INFO TaskSetManager: Lost task 2.0 in stage 1.0 (TID 6) on executor localhost: java.lang.IllegalArgumentException (port out of range:1315905645) [duplicate 1]
15/12/21 14:27:39 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/12/21 14:27:39 INFO TaskSetManager: Lost task 3.0 in stage 1.0 (TID 7) on executor localhost: java.lang.IllegalArgumentException (port out of range:1315905645) [duplicate 2]
15/12/21 14:27:39 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/12/21 14:27:39 INFO TaskSetManager: Lost task 1.0 in stage 1.0 (TID 5) on executor localhost: java.lang.IllegalArgumentException (port out of range:1315905645) [duplicate 3]
15/12/21 14:27:39 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/12/21 14:27:39 INFO TaskSchedulerImpl: Cancelling stage 1
15/12/21 14:27:39 INFO DAGScheduler: ResultStage 1 (count at <ipython-input-3-a0443394e570>:1) failed in 0.537 s
15/12/21 14:27:39 INFO DAGScheduler: Job 1 failed: count at <ipython-input-3-a0443394e570>:1, took 0.564707 s
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-3-a0443394e570> in <module>()
----> 1 rdd.count()

/Users/jason/spark-1.5.2-bin-hadoop2.6/python/pyspark/rdd.pyc in count(self)
   1004         3
   1005         """
-> 1006         return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
   1007 
   1008     def stats(self):

/Users/jason/spark-1.5.2-bin-hadoop2.6/python/pyspark/rdd.pyc in sum(self)
    995         6.0
    996         """
--> 997         return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
    998 
    999     def count(self):

/Users/jason/spark-1.5.2-bin-hadoop2.6/python/pyspark/rdd.pyc in fold(self, zeroValue, op)
    869         # zeroValue provided to each partition is unique from the one provided
    870         # to the final reduce call
--> 871         vals = self.mapPartitions(func).collect()
    872         return reduce(op, vals, zeroValue)
    873 

/Users/jason/spark-1.5.2-bin-hadoop2.6/python/pyspark/rdd.pyc in collect(self)
    771         """
    772         with SCCallSiteSync(self.context) as css:
--> 773             port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
    774         return list(_load_from_socket(port, self._jrdd_deserializer))
    775 

/Users/jason/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    536         answer = self.gateway_client.send_command(command)
    537         return_value = get_return_value(answer, self.gateway_client,
--> 538                 self.target_id, self.name)
    539 
    540         for temp_arg in temp_args:

/Users/jason/spark-1.5.2-bin-hadoop2.6/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     34     def deco(*a, **kw):
     35         try:
---> 36             return f(*a, **kw)
     37         except py4j.protocol.Py4JJavaError as e:
     38             s = e.java_exception.toString()

/Users/jason/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    298                 raise Py4JJavaError(
    299                     'An error occurred while calling {0}{1}{2}.\n'.
--> 300                     format(target_id, '.', name), value)
    301             else:
    302                 raise Py4JError(

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 4, localhost): java.lang.IllegalArgumentException: port out of range:1315905645
    at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
    at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
    at java.net.Socket.<init>(Socket.java:244)
    at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
    at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:909)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:908)
    at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
    at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    at py4j.Gateway.invoke(Gateway.java:259)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:207)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: port out of range:1315905645
    at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
    at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
    at java.net.Socket.<init>(Socket.java:244)
    at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
    at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    ... 1 more

Any ideas?

zero323
  • 322,348
  • 103
  • 959
  • 935
Jason Matney
  • 552
  • 6
  • 24
  • 2
    `IPYTHON=1` has been deprecated for a while now. I would simply try with `PYSPARK_PYTHON` / `PYSPARK_DRIVER_PYTHON`. – zero323 Dec 21 '15 at 20:12
  • 2
    You can use jupyter, read this [Link Apache Spark with jupyter](http://stackoverflow.com/questions/33064031/link-spark-with-ipython-notebook/33065359#33065359) – Alberto Bonsanto Dec 21 '15 at 21:07
  • Possible duplicate of [Exception: could not open socket on pyspark](http://stackoverflow.com/questions/32299656/exception-could-not-open-socket-on-pyspark) – zero323 Mar 21 '16 at 00:49
  • I am also having the same problem. It happens when some rdd.sum() is calculated in a `for i in xrange(n)` loop portion in my pyspark program. I am using Spark 1.6.1 and deploy as a yarn-client. The log reported `port out of range`. Then somehow the driver shut down, which causes socket timeout. I have been debugging for couple of days and it is getting me crazy. – panc Jul 08 '16 at 21:30

1 Answers1

1

pyspark read some bytes at the bengining of python std to get worker port value when star daemon.py, maybe you have some configuraion added some outputs to std.

Guozhen
  • 56
  • 7