0

I have the time as the following 103400 with string type I need to convert it into 10:34:00 in a time format using pyspark only. suppose that the name of the data frame is u and the name of the column is hhmm_utente

Peter
  • 155
  • 6
  • Possible duplicate of [How to convert datetime from string format into datetime format in pyspark?](https://stackoverflow.com/questions/39198062/how-to-convert-datetime-from-string-format-into-datetime-format-in-pyspark) – erncyp Oct 07 '19 at 10:00
  • Thanks for your response. My case is different, the time is like 103400. I need firstly to convert it to 10:34:00 – Peter Oct 07 '19 at 10:05
  • 1
    I dont think "10:34:00" on its own is a valid data type? https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/types/DataTypes.html. I was going to reccomend timedelta but that doesnt exist in pyspark (https://stackoverflow.com/questions/52702199/timedelta-in-pyspark-dataframes-typeerror?rq=1). Perhaps convert it to seconds as that post suggests. convert each part of the hour, minute second to get total seconds in integers? – erncyp Oct 07 '19 at 10:35
  • 1
    You could also add a dummy date like 01/01/2000 as a string, then use something like this: `df.select(to_timestamp(df.t, 'yyyy-MM-dd HH:mm:ss').alias('dt')).collect() `https://stackoverflow.com/a/41273036/4703367 – erncyp Oct 07 '19 at 10:38
  • first I need to separate hours and minutes using a colon (:). I found the following code that separate hours and minutes, but I need also to separate minutes and seonds the code is : u.withColumn("hhmm_utente", regexp_replace(col("hhmm_utente") , "(\\d{2})(\\d{2})" , "$1:$2" ) ).show() – Peter Oct 07 '19 at 10:40
  • if you use to_timestamp you dont need to do that. You could use : `df.select(to_timestamp(df.t, 'yyyy-MM-dd HHmmss').alias('dt')).collect()` (i removed the colons) – erncyp Oct 07 '19 at 10:42
  • If I was doing it in pandas, I would be able to split a string column it into three different columns. You can do that in pyspark too https://stackoverflow.com/questions/39235704/split-spark-dataframe-string-column-into-multiple-columns – erncyp Oct 07 '19 at 10:43
  • The problem is that I have the data as it's time as single column with formats like this 103400 (mean 10:34:00) and date in a single column. I need just to add colon to the 103400 to be like (10:34:00) Then concatenate with date then convert it to datetime – Peter Oct 07 '19 at 10:44
  • @erncyp by the way I tried your solution df.select(to_timestamp(df.t, 'yyyy-MM-dd HHmmss').alias('dt')).collect(), and it doesn't work – Peter Oct 07 '19 at 10:52
  • Did you try splitting it into three different columns then joining them with colon in between? – erncyp Oct 07 '19 at 12:30
  • Possible duplicate of [Pyspark, add colon to separate string](https://stackoverflow.com/questions/58270077/pyspark-add-colon-to-separate-string) – pault Oct 07 '19 at 14:02

0 Answers0