I'm new to in Scala, I have dataframe where I'm trying one column of dataframe to date from string in other word like below
1) yyyyMMddHHmmss(20150610120256) ->yyyy-MM-dd HH:mm:ss(2015-06-10 12:02:56)
2) yyyyMMddHHmmss(20150611 ) ->yyyy-MM-dd(2015-06-11)
First case i'm able achieve successfully but problem with second case where time is miss due to of this i'm not bale to convert into date.More details you could get below.Any help will be appreciated.
df.printSchema
root
|-- TYPE: string (nullable = true)
|-- CODE: string (nullable = true)
|-- SQ_CODE: string (nullable = true)
|-- RE_TYPE: string (nullable = true)
|-- VERY_ID: long (nullable = true)
|-- IN_DATE: string (nullable = true)
df.show
Input
+-----+-------+---------+---------+-------------------+-----------------+
| TYPE| CODE| SQ_CODE| RE_TYPE | VERY_ID| IN_DATE |
+-----+-------+---------+---------+-------------------+-----------------+
| F | 000544| 2017002| OP | 95032015062763298| 20150610120256 |
| F | 000544| 2017002| LD | 95032015062763261| 20150611 |
| F | 000544| 2017002| AK | 95037854336743246| 20150611012356 |
+-----+-------+---------+--+------+-------------------+-----------------+
df=df.withColumn("IN_DATE",when(lit(length(regexp_replace(df("IN_DATE"),"\\s+",""))) === lit(8) ,
to_date(from_unixtime(regexp_replace(df("IN_DATE"),"\\s+",""),"yyyyMMdd").cast("date")))
.otherwise(unix_timestamp(df("IN_DATE"),"yyyyMMddHHmmss").cast("timestamp")))
Actual output
+-----+-------+---------+---------+-------------------+----------------------+
| TYPE| CODE| SQ_CODE| RE_TYPE | VERY_ID| IN_DATE |
+-----+-------+---------+---------+-------------------+----------------------+
| F | 000544| 2017002| OP | 95032015062763298| 2015-06-10 12:02:56 |
| F | 000544| 2017002| LD | 95032015062763261| null |
| F | 000544| 2017002| AK | 95037854336743246| 2015-06-11 01:23:56 |
+-----+-------+---------+--+------+-------------------+----------------------+
df=df.withColumn("IN_DATE",when(lit(length(regexp_replace(df("IN_DATE"),"\\s+",""))) === lit(8) ,
to_date(from_unixtime(regexp_replace(df("IN_DATE"),"\\s+",""),"yyyyMMdd").cast("timestamp")))
.otherwise(unix_timestamp(df("IN_DATE"),"yyyyMMddHHmmss").cast("timestamp")))
Actual output
+-----+-------+---------+---------+-------------------+----------------------+
| TYPE| CODE| SQ_CODE| RE_TYPE | VERY_ID| IN_DATE |
+-----+-------+---------+---------+-------------------+----------------------+
| F | 000544| 2017002| OP | 95032015062763298| 2015-06-10 12:02:56 |
| F | 000544| 2017002| LD | 95032015062763261| 2015-06-11 00:00:00 |
| F | 000544| 2017002| AK | 95037854336743246| 2015-06-11 01:23:56 |
+-----+-------+---------+--+------+-------------------+----------------------+
Expected output
+-----+-------+---------+---------+-------------------+----------------------+
| TYPE| CODE| SQ_CODE| RE_TYPE | VERY_ID| IN_DATE |
+-----+-------+---------+---------+-------------------+----------------------+
| F | 000544| 2017002| OP | 95032015062763298| 2015-06-10 12:02:56 |
| F | 000544| 2017002| LD | 95032015062763261| 2015-06-11 |
| F | 000544| 2017002| AK | 95037854336743246| 2015-06-11 01:23:56 |
+-----+-------+---------+--+------+-------------------+----------------------+