Task - read a csv file, add 2 columns in lower case, sort & save the file. Problem - if sorting is applied, it is creating multiple files. Can someone please explain me what is happening here?
var df = spark.read
.format("csv")
.option("header", "true")
.load(i_file)
.select("Id", "Name", "Address")
df = df.withColumn("x_name", lower(col("Name")))
df = df.withColumn("x_address", lower(col("Address")))
df = df.orderBy("x_name") <---this line
df.write.option("header", "true").csv(o_file)
If I remove orderBy, it will create 1 file.