I have the time as the following 103400 with string type I need to convert it into 10:34:00 in a time format using pyspark only. suppose that the name of the data frame is u and the name of the column is hhmm_utente
Asked
Active
Viewed 46 times
0
-
Possible duplicate of [How to convert datetime from string format into datetime format in pyspark?](https://stackoverflow.com/questions/39198062/how-to-convert-datetime-from-string-format-into-datetime-format-in-pyspark) – erncyp Oct 07 '19 at 10:00
-
Thanks for your response. My case is different, the time is like 103400. I need firstly to convert it to 10:34:00 – Peter Oct 07 '19 at 10:05
-
1I dont think "10:34:00" on its own is a valid data type? https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/types/DataTypes.html. I was going to reccomend timedelta but that doesnt exist in pyspark (https://stackoverflow.com/questions/52702199/timedelta-in-pyspark-dataframes-typeerror?rq=1). Perhaps convert it to seconds as that post suggests. convert each part of the hour, minute second to get total seconds in integers? – erncyp Oct 07 '19 at 10:35
-
1You could also add a dummy date like 01/01/2000 as a string, then use something like this: `df.select(to_timestamp(df.t, 'yyyy-MM-dd HH:mm:ss').alias('dt')).collect() `https://stackoverflow.com/a/41273036/4703367 – erncyp Oct 07 '19 at 10:38
-
first I need to separate hours and minutes using a colon (:). I found the following code that separate hours and minutes, but I need also to separate minutes and seonds the code is : u.withColumn("hhmm_utente", regexp_replace(col("hhmm_utente") , "(\\d{2})(\\d{2})" , "$1:$2" ) ).show() – Peter Oct 07 '19 at 10:40
-
if you use to_timestamp you dont need to do that. You could use : `df.select(to_timestamp(df.t, 'yyyy-MM-dd HHmmss').alias('dt')).collect()` (i removed the colons) – erncyp Oct 07 '19 at 10:42
-
If I was doing it in pandas, I would be able to split a string column it into three different columns. You can do that in pyspark too https://stackoverflow.com/questions/39235704/split-spark-dataframe-string-column-into-multiple-columns – erncyp Oct 07 '19 at 10:43
-
The problem is that I have the data as it's time as single column with formats like this 103400 (mean 10:34:00) and date in a single column. I need just to add colon to the 103400 to be like (10:34:00) Then concatenate with date then convert it to datetime – Peter Oct 07 '19 at 10:44
-
@erncyp by the way I tried your solution df.select(to_timestamp(df.t, 'yyyy-MM-dd HHmmss').alias('dt')).collect(), and it doesn't work – Peter Oct 07 '19 at 10:52
-
Did you try splitting it into three different columns then joining them with colon in between? – erncyp Oct 07 '19 at 12:30
-
Possible duplicate of [Pyspark, add colon to separate string](https://stackoverflow.com/questions/58270077/pyspark-add-colon-to-separate-string) – pault Oct 07 '19 at 14:02