18

I am running this code on a local machine:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "/Users/username/Spark/README.md"
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

I'd like to run the program but run it on different files - it currently only runs on README.md. How do I pass the file path of another file when running Spark (or any other argument for that matter?). For example, I'd like to change contains("a") to another letter.

I make the program run by:

$ YOUR_SPARK_HOME/bin/spark-submit \
  --class "SimpleApp" \
  --master local[4] \
  target/scala-2.10/simple-project_2.10-1.0.jar

Thanks!

monster
  • 1,762
  • 3
  • 20
  • 38

2 Answers2

22

When you set up your main in

 def main(args: Array[String]) {

you are preparing your main to accept anything after the .jar line as an argument. It will make an array named 'args' for you out of them. You then access them as usual with args[n].

It might be good to check your arguments for type and/or format, it usually is if anyone other than you might run this.

So instead of setting the

val logFile = "String here"

set it

val logFile = args(0)

and then pass the file as the first argument. Check spark-submit docs for more on that, but, you just enter it on the next line basically.

suiterdev
  • 651
  • 5
  • 9
  • suiterdev did you mean args(0) ? – user1050325 Jan 27 '16 at 23:20
  • No, I meant the square brackets. This is an array index in Scala. Something may have changed in Scala since I wrote this, but as of the time of this writing, this form means "the val named 'logfile' should assume as its contents the contents of the first item in the array named 'args', which is at index location number zero." – suiterdev Jan 28 '16 at 21:23
  • Actually @user1050325 turns out I did mean parentheses for this and just didn't realize it - I was probably thinking in Java at the time. Thanks for the catch and I'll update the answer. – suiterdev Jan 28 '16 at 21:37
  • @suiterdev how do you pass different argument types (string, int, double) to the ```def main(args: Array[String]) {``` array to avoid "type mismatch" error? thanks! – thePurplePython Feb 22 '20 at 18:24
0

replace value of logFile variable with below

val logFile= args(0)

And, pass the actual value in an argument while running spark-submit like below-

spark-submit --master local --class "SimpleApp" target/scala-2.10/simpleapp_2.10-1.0.jar "/Users/username/Spark/README.md" 
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245