Run Spark code written in Scala in spark cluster

Question

I have IntelliJ IDE installed in my laptop. I am trying to do some Bigdata Spark POCs written in Scala. My requirement is that the spark-scala code written in IntelliJ IDE should run in spark cluster when I click Run. My spark cluster is residing in windows azure cloud. How can I achieve this?

The usual method is to generate a fat jar using a sbt-assembly plugin, copy the jar to the spark cluster's master node, ssh to the node and run it using spark-submit. You can definitely do the first step within IntelliJ - https://stackoverflow.com/questions/28459333/how-to-build-an-uber-jar-fat-jar-using-sbt-within-intellij-idea. Do you need the other steps to happen within IntelliJ as well? — xan, Mar 15 '18 at 06:45

score 2 · Accepted Answer · answered Mar 15 '18 at 06:46

2

One way is to create a script to run the jar file created, and run that script.

And another way it touse Azure Toolkit plugin.

You can use Azure Toolkit for IntelliJ Intellij Idea plugin to submit, run debug the spark application

Search and install the plugin as below

To submit and run the application you can follow the documentation here

https://azure.microsoft.com/en-us/blog/hdinsight-tool-for-intellij-is-ga/

Here is the example https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-intellij-tool-plugin

Hope this helps!

answered Mar 15 '18 at 06:46

koiralo

22,594
6
51
72

Thank You Shankar...I Am going through the links and seems like this is what i was looking for. – TomG Mar 15 '18 at 11:48
Great that helped you, I am working with the azure and spark but I don't use the plugins, I just use scripts to run. – koiralo Mar 15 '18 at 11:50
I was trying to link the HDinsight cluster. but got the error.Authentication Error: Socket operation on nonsocket: connect. Any Idea why this happens. – TomG Mar 15 '18 at 15:51
Can you provide the alternative method. i.e running through scripts – TomG Mar 15 '18 at 15:53
I am not sure with the error, Script is to write all the steps that we manually do when submitting the jar to Azure cluster. – koiralo Mar 15 '18 at 15:55
is there a similar setup for pyspark/python. – TomG Nov 18 '19 at 07:57

score 0 · Answer 2 · answered Mar 15 '18 at 07:15

step 1:before starting the process you have to download the hadoop bin

https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin

and you have to set the hadoop home in environment variables example:C:\Hadoop\hadoop

Step2:Then Download the spark of desired version

add the path C:\Hadoop\spark-1.6.0-bin-hadoop2.6\bin to environment variables

step3: open cmd and go to spark folder till bin C:\Hadoop\spark-1.6.0-bin-hadoop2.6\bin and type following command spark-class org.apache.spark.deploy.master.Master it will give the spark master ip like example spark://localhost:7077 step4:open another cmd and go to spark folder till bin and type following command spark-class org.apache.spark.deploy.worker.Worker SparkMasterIp

step5: To check it is working or not we can test by below command C:\Hadoop\spark-1.6.0-bin-hadoop2.6\bin\spark-shell -master masterip

now you can build your jar and submit the jar to spark-submit from cmd

Run Spark code written in Scala in spark cluster

2 Answers2