0

In order to parse command line arguments while using spark-submit:

SPARK_MAJOR_VERSION=2 spark-submit --class com.partition.source.Pickup --master=yarn --conf spark.ui.port=0000 --driver-class-path /home/hdpusr/jars/postgresql-42.1.4.jar --conf spark.jars=/home/hdpusr/jars/postgresql-42.1.4.jar,/home/hdpusr/jars/postgresql-42.1.4.jar --executor-cores 4 --executor-memory 4G --keytab /home/hdpusr/hdpusr.keytab --principal hdpusr@DEVUSR.COM --files /usr/hdp/current/spark2-client/conf/hive-site.xml,testconnection.properties --name Spark_APP --conf spark.executor.extraClassPath=/home/hdpusr/jars/greenplum.jar sparkload_2.11-0.1.jar ORACLE

I am passing a database name: ORACLE which I am parsing it in the code as

  def main(args: Array[String]): Unit = {
    val dbtype   = args(0).toString
    .....
  }

Is there a way I can give it a name like "--dbname" and then check for that option in the spark-submit to get the option's value ? Ex:

SPARK_MAJOR_VERSION=2 spark-submit --class com.partition.source.Pickup --master=yarn --conf spark.ui.port=0000 --driver-class-path /home/hdpusr/jars/postgresql-42.1.4.jar --conf spark.jars=/home/hdpusr/jars/postgresql-42.1.4.jar,/home/hdpusr/jars/postgresql-42.1.4.jar --executor-cores 4 --executor-memory 4G --keytab /home/hdpusr/hdpusr.keytab --principal hdpusr@DEVUSR.COM --files /usr/hdp/current/spark2-client/conf/hive-site.xml,testconnection.properties --name Spark_APP --conf spark.executor.extraClassPath=/home/hdpusr/jars/greenplum.jar sparkload_2.11-0.1.jar --dbname ORACLE

In Java there are two packages which can be used to do the same:

    import org.apache.commons.cli.Option;
    import org.apache.commons.cli.Options;
    public static void main(String[] args) {
       Options options = new Options();
       Option input = new Option("s", "ssn", true, "source system names");
       input.setRequired(false);
       options.addOption(input);
       CommandLineParser parser = new DefaultParser();
       HelpFormatter formatter  = new HelpFormatter();
       CommandLine cmd       = null;
       try {
            cmd = parser.parse(options, args);
            if(cmd.hasOption("s")) {            // Checks if there is an argument '--s' in the CLI. Runs the Recon only for the received SSNs.
            }
       } catch(ParseException e) {
          formatter.printHelp("utility-name", options);
          e.printStackTrace();
          System.exit(1);
       } catch(Exception e) {
         e.printStackTrace();
       }
    }

Could anyone let me know if it is possible to name the command line arguments and parse them accordingly ?

Metadata
  • 2,127
  • 9
  • 56
  • 127
  • 4
    Just add [commons-cli](https://mvnrepository.com/artifact/commons-cli/commons-cli) to your dependencies and it should work the same. You can use Java dependencies in Scala without any problem. On a side note though, why can't you use `sparkConf` for your options? – sachav May 24 '19 at 10:13
  • Got it..Gonna use that dependency and post it back the results. – Metadata May 30 '19 at 10:22

2 Answers2

1

If you use --dbname=ORACLE for example.

val pattern = """--dbname=(.*)""".r
  val params = args.map {
    case pattern(pair, _) => pair
    case arg => throw new ConfigException.Generic(s"""unable to parse command-line argument "$arg"""")
}

\s Matches whitespace, you can use it to create --dbname ORACLE, but it's easier if you just use a string.

Here you can see all the possibilities.

0

If we are not specific about the key name, we can prefix the key name with spark. in this case spark.dbname, and pass an conf argument like spark-submit --conf spark.dbname=<> .... or add it to the spark-defaults.conf
In the user code, we can access the key as sparkContext.getConf.get("spark.dbname")

DaRkMaN
  • 1,014
  • 6
  • 9