1

I have a column containing string data like "2023-03-13T15:18:14+0700". My final goal is to convert it to a proper date format like "2023-03-13 15:18:14". It's best to convert the time to GMT+7 (my location) and then remove the "T" and "+XXXX" part. But if it's too hard or impossible to do, I just need to remove the "T" and "+0700" since most of my data is "+0700".

I read many posts on SOF but had no luck so far. For example, here, here, and the closest one is this but no luck since their format is a bit different from mine.

Below is what I got from the latest post:

object test extends App {
  val spark = SparkSession.builder().master("local[*]").getOrCreate()
  import spark.implicits._
  val df = Seq("2023-03-13T15:18:14+0700").toDF("time")

  val result = df.select(to_timestamp(col("time"), "yyyy-MM-dd'T'hh:mm:ss.SSSXXX").alias("newtime"))
  result.show(truncate = false) // Null

  val result1 = df.select(to_timestamp(col("time"), "yyyy-MM-dd'T'hh:mm:ssXXX").alias("newtime"))
  result1.show(truncate = false) // Null
}
Abdennacer Lachiheb
  • 4,388
  • 7
  • 30
  • 61

2 Answers2

1

Use cast() transformation

from pyspark.sql.functions import col
from pyspark.sql.types import TimestampType

df = df.withColumn("time", col("time").cast(TimestampType()))

df.show()

Output

+-------------------+
|               time|
+-------------------+
|2023-03-13 08:18:14|
+-------------------+

Schema

root
 |-- time: timestamp (nullable = true)
arudsekaberne
  • 830
  • 4
  • 11
1

You are not using the right format, the date you have is of type ISO 8601, the right format is yyyy-MM-dd'T'HH:mm:ssZ, here's how to use it using to_timestamp function:

spark = SparkSession.builder.master("local[*]").getOrCreate()
df = spark.createDataFrame([["2023-03-13T15:18:14+0700"]], ['time'])
df = df.withColumn("timestamp_utc", to_timestamp("time", "yyyy-MM-dd'T'HH:mm:ssZ"))
df.show(truncate=False)
df.printSchema()

+------------------------+-------------------+
|time                    |timestamp_utc      |
+------------------------+-------------------+
|2023-03-13T15:18:14+0700|2023-03-13 09:18:14|
+------------------------+-------------------+

root
 |-- time: string (nullable = true)
 |-- timestamp_utc: timestamp (nullable = true)
Abdennacer Lachiheb
  • 4,388
  • 7
  • 30
  • 61