-1

The conversion of the string to datetime is failing. The data in the dataframe has the following format: "2020-08-05T12:34:10.800046".
I used pattern yyyy-MM-ddTHH:mm:ss.SSSSSS

config_df.withColumn(
    "modifiedDate",
    F.to_timestamp(config_df["modifiedDate"], "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"),
).show()

+------------+
|modifiedDate|
+------------+
|        null|
+------------+

The execution works without problem but all values in the updated column are NULL. Which format should I use?

Steven
  • 14,048
  • 6
  • 38
  • 73
Tobias Bruckert
  • 348
  • 2
  • 12
  • 1
    Welcome to Stack Overflow. Your question is quite good, but would be nice to share a https://stackoverflow.com/help/minimal-reproducible-example, so we can run the code ourselves and help you. Which format is in the dataset? What is the complete error message? – Ramon Medeiros Sep 08 '20 at 11:57
  • thanks for the hint. I dont get any error it is just returning NULLs – Tobias Bruckert Sep 08 '20 at 12:19

1 Answers1

0

According to this post, SSSis for milliseconds. Therefore, it matches the first 3 digits 800 in your 800046, no matter how many S you add.

I couldn't find any pattern that match your date, so you first need to update your string to keep only 3 digits at the end. With a regex for example

a = [
    ("2020-08-05T12:34:10.800123",),
]
b = ["modifiedDate"]
df = spark.createDataFrame(a, b)

df.withColumn(
    "modifiedDate",
    F.to_timestamp(
        F.regexp_extract(
            "modifiedDate", r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}", 0
        ),
        "yyyy-MM-dd'T'HH:mm:ss.SSS",
    ),
).show()


+-------------------+
|       modifiedDate|
+-------------------+
|2020-08-05 12:34:10|
+-------------------+
Steven
  • 14,048
  • 6
  • 38
  • 73