2

I'm trying to pass a value to my Spark program which would be used as a delimiter to read a .dat file. My code looks something like this

val delim = args(0)
val df = spark.read.format("csv").option("delimiter", delim).load("/path/to/file/")

And I run the program as following command -

spark2-submit --class a.b.c.MyClass My.jar \\u0001

But I get an error saying that multiple characters can't be used as delimiter. But when I directly use the String instead of getting it as a variable, the code works fine

val df = spark.read.format("csv").option("delimiter", "\u0001").load("/path/to/file/")

Can someone help me with this?

Shantanu Kher
  • 1,014
  • 1
  • 8
  • 14
Amber
  • 914
  • 6
  • 20
  • 51
  • please check if the following thread helps - https://stackoverflow.com/questions/29928999/passing-command-line-arguments-to-spark-shell – Shantanu Kher Jul 01 '20 at 20:46

1 Answers1

3

String "\u0001" is a unicode character, but what is passed to spark from the command line is a literal string "\\u0001". You need to explicitly unescape Unicode:

val df = spark.read.format("csv").option("delimiter", unescapeUnicode(delim)).load("/path/to/file/")

Find unescapeUnicode function in this answer.

Kombajn zbożowy
  • 8,755
  • 3
  • 28
  • 60