44

I have a data frame with a column of unix timestamp(eg.1435655706000), and I want to convert it to data with format 'yyyy-MM-DD', I've tried nscala-time but it doesn't work.

val time_col = sqlc.sql("select ts from mr").map(_(0).toString.toDateTime)
time_col.collect().foreach(println)

and I got error: java.lang.IllegalArgumentException: Invalid format: "1435655706000" is malformed at "6000"

Hammad Haleem
  • 1,374
  • 1
  • 16
  • 26
youngchampion
  • 599
  • 1
  • 4
  • 5
  • http://stackoverflow.com/questions/18680398/convert-seconds-since-epoch-to-joda-datetime-in-scala – Nikita Jun 30 '15 at 09:54
  • Solved, import org.joda.time._, sqlc.sql("select ts from mr").map(line => new DateTime(line(0)).toString("yyyy-MM-dd")) – youngchampion Jul 01 '15 at 07:55

7 Answers7

37

Here it is using Scala DataFrame functions: from_unixtime and to_date

// NOTE: divide by 1000 required if milliseconds
// e.g. 1446846655609 -> 2015-11-06 21:50:55 -> 2015-11-06 
mr.select(to_date(from_unixtime($"ts" / 1000))) 
Logan
  • 53
  • 5
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154
  • Since I want to have it in UTC I did the following: to_utc_timestamp(from_unixtime($"buyOrdrEntryTime"/1000,"yyyy-MM-dd hh:mm:ss"),"Europe/Berlin").as("buyOrdrEntryTime") – Playing With BI Aug 09 '18 at 09:50
29

Since spark1.5 , there is a builtin UDF for doing that.

val df = sqlContext.sql("select from_unixtime(ts,'YYYY-MM-dd') as `ts` from mr")

Please check Spark 1.5.2 API Doc for more info.

Yuan Zhao
  • 479
  • 4
  • 6
  • 3
    You can also import org.apache.spark.sql.functions._ to use these functions in function calls. Example: df.select(from_unixtime($"ts_col"/1000,"yyyy-MM-dd")).toDF("event_date").groupBy("event_date").count – panther Jan 11 '17 at 19:37
  • 10
    The problem is current time zone will be used for conversion, not UTC – Oleg Jan 19 '17 at 09:20
  • 3
    Hi @Oleg , I solved the UTC problem in this way: to_utc_timestamp(from_unixtime($"buyOrdrEntryTime"/1000,"yyyy-MM-dd hh:mm:ss"),"Europe/Berlin").as("buyOrdrEntryTime"). It is important to specify the Time Zone in this way in order to avoid day light savings problems. Hope this helps. – Playing With BI Aug 09 '18 at 09:49
  • @Playing With BI is not UTC if the timestamp were not generated in Berlin's timezone. If your data is in UTC and you want to be sure it is considered as UTC, you can specify Europe/London. – ZettaP Feb 05 '20 at 07:34
18
import org.joda.time.{DateTime, DateTimeZone}
import org.joda.time.format.DateTimeFormat

You need to import the following libraries.

val stri = new DateTime(timeInMillisec).toString("yyyy/MM/dd")

Or adjusting to your case :

 val time_col = sqlContext.sql("select ts from mr")
                     .map(line => new DateTime(line(0).toInt).toString("yyyy/MM/dd"))

There could be another way :

  import com.github.nscala_time.time.Imports._
  
  val date = (new DateTime() + ((threshold.toDouble)/1000).toInt.seconds )
             .toString("yyyy/MM/dd")

Hope this helps :)

Matthew D. Scholefield
  • 2,977
  • 3
  • 31
  • 42
Hammad Haleem
  • 1,374
  • 1
  • 16
  • 26
7

You needn't convert to String before applying toDataTime with nscala_time

import com.github.nscala_time.time.Imports._

scala> 1435655706000L.toDateTime
res4: org.joda.time.DateTime = 2015-06-30T09:15:06.000Z

`

Orar
  • 940
  • 1
  • 10
  • 13
5

I have solved this issue using the joda-time library by mapping on the DataFrame and converting the DateTime into a String :

import org.joda.time._
val time_col = sqlContext.sql("select ts from mr")
                         .map(line => new DateTime(line(0)).toString("yyyy-MM-dd"))
eliasah
  • 39,588
  • 11
  • 124
  • 154
youngchampion
  • 599
  • 1
  • 4
  • 5
5

You can use the following syntax in Java

input.select("timestamp)
            .withColumn("date", date_format(col("timestamp").$div(1000).cast(DataTypes.TimestampType), "yyyyMMdd").cast(DataTypes.IntegerType))
Alex Stanovsky
  • 1,286
  • 1
  • 13
  • 28
3

What you can do is:

input.withColumn("time", concat(from_unixtime(input.col("COL_WITH_UNIX_TIME")/1000,
"yyyy-MM-dd'T'HH:mm:ss"), typedLit("."), substring(input.col("COL_WITH_UNIX_TIME"), 11, 3), 
typedLit("Z")))

where time is a new column name and COL_WITH_UNIX_TIME is the name of the column which you want to convert. This will give data in millis, making your data more accurate, like: "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"