I am facing an issue when trying to convert a dataframe to a dataset of objects with a custom field.
In this code, I have a dataframe with two columns, country, and currency. I want to convert this into a dataset using the MyObj
case class where the country is a string and currency is an enumeration.
Here is the code:
val schema = StructType(Seq(
StructField("country", StringType),
StructField("currency", StringType)
))
// Define the sample data
val data = Seq(
("France", "EUR"),
("USA", "DOLLAR"),
("Germany", "EUR")
)
// Create a DataFrame from the sample data
val df = sparkSession.createDataFrame(data).toDF(schema.fieldNames: _*)
class Currency extends Enumeration {
type Currency = Value
val EUR = Value("EUR")
val DOLLAR = Value("DOLLAR")
}
case class MyObj(country: String, currency: Currency)
val dsProduct = df.as[MyObj](Encoders.product[MyObj])
Here is the error I face when executing the program:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Try to map struct<country:string,currency:string> to Tuple1, but failed as the number of fields does not line up.
If I change the currency type to a string, it works just fine, but I want to keep it as an enumeration for another use case.
Any idea how can I fix that?