0

I am reading a CSV file through Apache Spark as below

spark.read.format("csv").option("header",true).option("inferSchema",true).load(file_path")

There is a timestamp field by the name 'EventTime' whose value is like 2012-11-12T01:01:00Z

But on showing the field from dataframe, it's value is getting converted into 2012-11-12 06:31:00. I think GMT + 5:30 is getting applied to it's value. I can see it for all the values of this column. I am not actually sure what's the reason for it.

I don't want date time value getting converted. How can I resolve this?

halfer
  • 19,824
  • 17
  • 99
  • 186
Anand
  • 20,708
  • 48
  • 131
  • 198
  • try using org.apache.spark.functions.date_format – Assaf Mendelson Sep 13 '18 at 07:23
  • @AssafMendelson But how can we do -5:30 to the date time as it's already been converted. – Anand Sep 13 '18 at 07:45
  • I believe that this is a representation issue, i.e. it is showing you the time in the local machine's time. If you set the timezone as part of the format, it should print it ok. You can make sure by converting the time column to long (cast as long) and checking the actual time – Assaf Mendelson Sep 13 '18 at 08:35

0 Answers0