1

I am trying to run the simple WordCount job in IPython notebook with Spark connected to an AWS EC2 cluster. The program works perfectly when I use Spark in the local standalone mode but throws the problem when I try to connect it to the EC2 cluster.

I have taken the following steps

I have followed instructions given in this Supergloo blogpost.

No errors are found until the last line where I try to write the output to a file. [The lazyloading feature of Spark means that this when the program really starts to execute]

This is where I get the error

[Stage 0:>                                                          (0 + 0) / 2]16/08/05 15:18:03 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Actually there is no error, we have this warning and the program goes into an indefinite wait state. Nothing happens until I kill the IPython notebook.

I have seen this Stackoverflow post and have reduced the number of cores to 1 and memory to 512 by using this options after the main command

--total-executor-cores 1 --executor-memory 512m

The screen capture from the SparkUI is as follows sparkUI

This clearly shows that both core and UI is not being fully utilized.

Finally, I see from this StackOverflow post that

The spark-ec2 script configure the Spark Cluster in EC2 as standalone, which mean it can not work with remote submits. I've been struggled with this same error you described for days before figure out it's not supported. The message error is unfortunately incorrect.

So you have to copy your stuff and log into the master to execute your spark task.

If this is indeed the case, then there is nothing more to be done, but since this statement was made in 2014, I am hoping that in the last 2 years the script has been rectified or there is a workaround. If there is any workaround, I would be grateful if someone can point it out to me please.

Thank you for your reading till this point and for any suggestions offered.

Community
  • 1
  • 1

1 Answers1

0

You can not submit jobs except on the Master - as you see - unless you set up a REST based Spark job server.

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560