0

For example I am having a dataframe which needs some processing and conversions on the columns and I am overriding the existing dataframe again and again like the code is given below

var fd = (spark.read.format("csv")
.option("inferSchema", "false")
.option("header", "true")
.load(csvFile))

fd = fd.withColumn("date", col("date").cast("String"))

I am new to spark, so don't know any better approach to this kind of operation.

Any suggestion?

Zeeshan
  • 375
  • 1
  • 3
  • 17
  • 1
    I doubt there is a difference. In either case, unused objects will be garbage collected. But either way, you should never ever ever use `var` in Spark. In many cases it will break your application before you even start worrying about memory. There are a handful of cases where it won't impact functionality but it's just easier to never use it than to make that distinction – sinanspd Dec 22 '21 at 06:53
  • Thank you for your answer, appreciate it :) – Zeeshan Dec 22 '21 at 07:02

0 Answers0