- Spark 2.1.1
- Scala 2.11.8
- Java 1.8.0_131 (64 bit)
- Linux Ubuntu 16.04 LTS
With my Spark application, I want to compute the average temperature over the year of all weather station. For this, I have made this RDD structure :
type TemperatureRecord = (LocalDate, Location, Double)
Below, a sample of my data :
{Tuple3@9233} "(1975-01-1,Location(70.933,-8.667),-4.888888888888889)"
{Tuple3@9234} "(1976-01-1,Location(70.933,-8.667),-11.88888888888889)"
{Tuple3@9235} "(1977-01-1,Location(70.933,-8.667),-13.61111111111111)"
I transforme my RDD to Dataset by this way :
val ds = rdd.toDF("date", "location", "temperature").as[TemperatureRecord]
To store my date, I must use the type java.time.LocalDate, for this reason, I add the following implicit conversion (otherwise, the dataset can't be converted and throw the error: No Encoder found for java.time.LocalDate).
implicit def singleEncoder[A](implicit c: ClassTag[A]): Encoder[A] = Encoders.kryo[A](c)
implicit def tuple2Encoder[A1, A2](
implicit e1: Encoder[A1],
e2: Encoder[A2]
): Encoder[(A1, A2)] = Encoders.tuple[A1, A2](e1, e2)
implicit def tuple3Encoder[A1, A2, A3](
implicit e1: Encoder[A1],
e2: Encoder[A2],
e3: Encoder[A3]
): Encoder[(A1, A2, A3)] = Encoders.tuple[A1, A2, A3](e1, e2, e3)
Below, the dataset schema:
root
|-- date: binary (nullable = true)
|-- location: struct (nullable = true)
| |-- latitude: double (nullable = true)
| |-- longitude: double (nullable = true)
|-- temperature: double (nullable = true)
And a sample of the data:
+--------------------+---------------+--------------------+
| date| location| temperature|
+--------------------+---------------+--------------------+
|[01 00 6A 61 76 6...|[70.933,-8.667]| -4.888888888888889|
|[01 00 6A 61 76 6...|[70.933,-8.667]| -11.88888888888889|
|[01 00 6A 61 76 6...|[70.933,-8.667]| -13.61111111111111|
Now my date is in binary format, my problem is that I lose the original date value. What can I do for that my field "date" give me the original value (ie. the date)