4

I'm reading in csv-files with in one column a string that should be converted to a datetime. The string is in the form MM/dd/yyyy HH:mm. However when I try to transform this using joda-time, I always get the error:

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.joda.time.DateTime is not supported

I don't know what exactly the problem is...

 val input = c.textFile("C:\\Users\\AAPL.csv").map(_.split(",")).map{p => 
      val formatter: DateTimeFormatter = DateTimeFormat.forPattern("MM/dd/yyyy HH:mm");
      val date: DateTime = formatter.parseDateTime(p(0));
      StockData(date, p(1).toDouble, p(2).toDouble, p(3).toDouble, p(4).toDouble, p(5).toInt, p(6).toInt)
}.toDF()

Anybody who can help?

Rohan Aletty
  • 2,432
  • 1
  • 14
  • 20
Giselle Van Dongen
  • 465
  • 1
  • 9
  • 18

1 Answers1

5

I don't know what exactly the problem is...

Well, the source of the problem is pretty much described by an error message. Spark SQL doesn't support Joda-Time DateTime as an input. A valid input for a date field is java.sql.Date (see Spark SQL and DataFrame Guide, Data Types for reference).

The simplest solution is to adjust StockData class so it takes java.sql.Data as an argument and replace:

val date: DateTime = formatter.parseDateTime(p(0))

with something like this:

val date: java.sql.Date = new java.sql.Date(
  formatter.parseDateTime(p(0)).getMillis)

or

val date: java.sql.Timestamp = new java.sql.Timestamp(
  formatter.parseDateTime(p(0)).getMillis)

if you want to preserve hour / minutes.

If you think about using window functions with range clause a better option is to pass string to a DataFrame and convert it to an integer timestamp:

import org.apache.spark.sql.functions.unix_timestamp

df.withColumn("ts", unix_timestamp($"date", "MM/dd/yyyy HH:mm"))

See Spark Window Functions - rangeBetween dates for details.

Community
  • 1
  • 1
zero323
  • 322,348
  • 103
  • 959
  • 935
  • Hmm, OP can load the date as a String and then convert to a DateType too right (since their goal is to have this as a DataFrame)? – Rohan Aletty Nov 13 '15 at 09:31
  • True, although I think it requires intermediate `timestamp` in this case. `to_date` doesn't take format string, and simple `cast(DateType)` will fail due to invalid input format. Do you see any other option? – zero323 Nov 13 '15 at 09:48
  • 1
    [UDF](http://stackoverflow.com/questions/29909448/add-new-column-in-dataframe-base-on-existing-column)? Or is that overkill? – Rohan Aletty Nov 13 '15 at 09:52
  • Maybe a little, but why not? Personally I would probably go with integer timestamps especially considering [the previous question](http://stackoverflow.com/q/33650880/1560062). – zero323 Nov 13 '15 at 10:07
  • 'val dateFormat: DateFormat = new SimpleDateFormat("MM/dd/yyyy HH:mm") val date: java.sql.Timestamp = new Timestamp(dateFormat.parse(p(0)).getTime)' Thanks a lot! with Timestamp I get exactly what I want, thank you for setting me on the right track – Giselle Van Dongen Nov 13 '15 at 10:10