0
  • Spark 2.1.1
  • Scala 2.11.8
  • Java 1.8.0_131 (64 bit)
  • Linux Ubuntu 16.04 LTS

With my Spark application, I want to compute the average temperature over the year of all weather station. For this, I have made this RDD structure :

type TemperatureRecord = (LocalDate, Location, Double)

Below, a sample of my data :

{Tuple3@9233} "(1975-01-1,Location(70.933,-8.667),-4.888888888888889)"
{Tuple3@9234} "(1976-01-1,Location(70.933,-8.667),-11.88888888888889)"
{Tuple3@9235} "(1977-01-1,Location(70.933,-8.667),-13.61111111111111)"

I transforme my RDD to Dataset by this way :

val ds = rdd.toDF("date", "location", "temperature").as[TemperatureRecord]

To store my date, I must use the type java.time.LocalDate, for this reason, I add the following implicit conversion (otherwise, the dataset can't be converted and throw the error: No Encoder found for java.time.LocalDate).

implicit def singleEncoder[A](implicit c: ClassTag[A]): Encoder[A] = Encoders.kryo[A](c)

implicit def tuple2Encoder[A1, A2](
                                          implicit e1: Encoder[A1],
                                          e2: Encoder[A2]
                                      ): Encoder[(A1, A2)] = Encoders.tuple[A1, A2](e1, e2)

implicit def tuple3Encoder[A1, A2, A3](
                                              implicit e1: Encoder[A1],
                                              e2: Encoder[A2],
                                              e3: Encoder[A3]
                                          ): Encoder[(A1, A2, A3)] = Encoders.tuple[A1, A2, A3](e1, e2, e3)

Below, the dataset schema:

root
 |-- date: binary (nullable = true)
 |-- location: struct (nullable = true)
 |    |-- latitude: double (nullable = true)
 |    |-- longitude: double (nullable = true)
 |-- temperature: double (nullable = true)

And a sample of the data:

+--------------------+---------------+--------------------+
|                date|       location|         temperature|
+--------------------+---------------+--------------------+
|[01 00 6A 61 76 6...|[70.933,-8.667]|  -4.888888888888889|
|[01 00 6A 61 76 6...|[70.933,-8.667]|  -11.88888888888889|
|[01 00 6A 61 76 6...|[70.933,-8.667]|  -13.61111111111111|

Now my date is in binary format, my problem is that I lose the original date value. What can I do for that my field "date" give me the original value (ie. the date)

JimyRyan
  • 359
  • 1
  • 2
  • 17
  • Exact duplicate : https://stackoverflow.com/questions/45192864/how-to-use-java-time-localdate-in-datasets-fails-with-java-lang-unsupportedoper – philantrovert Jul 21 '17 at 11:00
  • My question isn't about pure conversion from RDD to Dataset (ask in the question that you mention https://stackoverflow.com/questions/45192864/how-to-use-java-time-localdate-in-datasets-fails-with-java-lang-unsupportedoper), but how fetch the original value of the date field. At this moment, I only having the binary representation :( – JimyRyan Jul 21 '17 at 11:20

0 Answers0