0

Task - read a csv file, add 2 columns in lower case, sort & save the file. Problem - if sorting is applied, it is creating multiple files. Can someone please explain me what is happening here?

var df = spark.read
  .format("csv")
  .option("header", "true")
  .load(i_file)
  .select("Id", "Name", "Address")

df = df.withColumn("x_name", lower(col("Name")))
df = df.withColumn("x_address", lower(col("Address")))
df = df.orderBy("x_name") <---this line
df.write.option("header", "true").csv(o_file)

If I remove orderBy, it will create 1 file.

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
Eyedia Tech
  • 135
  • 1
  • 11

0 Answers0