Questions tagged [apache-spark-encoders]

54 questions
165
votes
9 answers

How to store custom objects in Dataset?

According to Introducing Spark Datasets: As we look forward to Spark 2.0, we plan some exciting improvements to Datasets, specifically: ... Custom encoders – while we currently autogenerate encoders for a wide variety of types, we’d like to…
zero323
  • 322,348
  • 103
  • 959
  • 935
66
votes
3 answers

Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

Spark 2.0 (final) with Scala 2.11.8. The following super simple code yields the compilation error Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported…
clay
  • 18,138
  • 28
  • 107
  • 192
43
votes
4 answers

Encoder error while trying to map dataframe row to updated row

When I m trying to do the same thing in my code as mentioned below dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) I have taken the above reference…
35
votes
2 answers

Encoder for Row Type Spark Datasets

I would like to write an encoder for a Row type in DataSet, for a map operation that I am doing. Essentially, I do not understand how to write encoders. Below is an example of a map operation: In the example below, instead of returning…
28
votes
2 answers

How to convert a dataframe to dataset in Apache Spark in Scala?

I need to convert my dataframe to a dataset and I used the following code: val final_df = Dataframe.withColumn( "features", toVec4( // casting into Timestamp to parse the string, and then into Int …
user8131063
23
votes
3 answers

How to create a custom Encoder in Spark 2.X Datasets?

Spark Datasets move away from Row's to Encoder's for Pojo's/primitives. The Catalyst engine uses an ExpressionEncoder to convert columns in a SQL expression. However there do not appear to be other subclasses of Encoder available to use as a…
18
votes
3 answers

Why is the error "Unable to find encoder for type stored in a Dataset" when encoding JSON using case classes?

I've written spark job: object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Simple Application").setMaster("local") val sc = new SparkContext(conf) val ctx = new…
14
votes
2 answers

Encode an ADT / sealed trait hierarchy into Spark DataSet column

If I want to store an Algebraic Data Type (ADT) (ie a Scala sealed trait hierarchy) within a Spark DataSet column, what is the best encoding strategy? For example, if I have an ADT where the leaf types store different kinds of data: sealed trait…
12
votes
1 answer

Apache Spark 2.0: java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate

I am using Apache Spark 2.0 and creating case class for mention schema for DetaSet. When i am trying to define custom encoder according to How to store custom objects in Dataset?, for java.time.LocalDate i got following exception:…
10
votes
3 answers

Convert scala list to DataFrame or DataSet

I am new to Scala. I am trying to convert a scala list (which is holding the results of some calculated data on a source DataFrame) to Dataframe or Dataset. I am not finding any direct method to do that. However, I have tried the following process…
8
votes
1 answer

Spark Dataset : Example : Unable to generate an encoder issue

New to spark world and trying a dataset example written in scala that I found online On running it through SBT , i keep on getting the following error org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class Any idea…
8
votes
1 answer

Spark Dataset and java.sql.Date

Let's say I have a Spark Dataset like this: scala> import java.sql.Date scala> case class Event(id: Int, date: Date, name: String) scala> val ds = Seq(Event(1, Date.valueOf("2016-08-01"), "ev1"), Event(2, Date.valueOf("2018-08-02"), "ev2")).toDS I…
6
votes
2 answers

Rename columns in spark using @JsonProperty while creating Datasets

Is there way to rename the column names in dataset using Jackson annotations while creating a Dataset? My encoder class is as follows: import com.fasterxml.jackson.annotation.JsonProperty; import lombok.*; import scala.Serializable; import…
Arjav96
  • 79
  • 3
6
votes
5 answers

How to map rows to protobuf-generated class?

I need to write a job that reads a DataSet[Row] and converts it to a DataSet[CustomClass] where CustomClass is a protobuf class. val protoEncoder = Encoders.bean(classOf[CustomClass]) val transformedRows = rows.map { case Row(f1: String, f2: Long…
6
votes
1 answer

scala generic encoder for spark case class

How can I get this method to compile. Strangely, sparks implicit are already imported. def loadDsFromHive[T <: Product](tableName: String, spark: SparkSession): Dataset[T] = { import spark.implicits._ spark.sql(s"SELECT * FROM…
1
2 3 4