2

I have some sales data and created DataFrame using a CSV file. In this DataFrame I need to add two additional columns process_date and next_processing_date. So written like this:

baseData.withColumn("Return_Grace_period", current_date()).withColumn("Next_processing_date", current_date() + 10).show()

Here current_date()+10 is causing an issue. In oracle we can use 10 for getting the next date. How can I do so in Spark?

Shaido
  • 27,497
  • 23
  • 70
  • 73
Learn Hadoop
  • 2,760
  • 8
  • 28
  • 60

1 Answers1

1

You can use the date_add function to add a number of days to a date:

baseData.withColumn("Next_processing_date", date_add(current_date(), 10))

To instead subtract a number of days, you can use the matching date_sub function.

Shaido
  • 27,497
  • 23
  • 70
  • 73
  • thanks but scala sql notation, it is causing the issue. So written like this baseData.select($"Sales",$"current_date".as("processing_date"),$"date_add(current_date(),2))".as("grace_period")).show() – Learn Hadoop Oct 11 '18 at 05:24
  • 1
    @LearnHadoop: `date_add` is not a column so you can't use the `$` notation here. You can write it as: `baseData.select($"Sales", current_date().as("processing_date"), date_add(current_date(), 2)).as("grace_period")).show()`. – Shaido Oct 11 '18 at 05:29
  • 1
    thanks a lot sir.. appreciated your help – Learn Hadoop Oct 11 '18 at 05:32