3

I'm doing a spark app using scala with following data:

+----------+--------------------+
|        id|                data|
+----------+--------------------+
|    id1   |[AC ED 00 05 73 7...|
|    id2   |[CF 33 01 61 88 9...|
+----------+--------------------+

The schema shows:

root
 |-- id: string (nullable = true)
 |-- data: binary (nullable = true)

I tried to convert this dataframe into a map object, with id being key and data being value

I have tried:

df.as[(String, BinaryType)].collect.toMap

but I got following error:

java.lang.UnsupportedOperationException: No Encoder found for org.apache.spark.sql.types.BinaryType
- field (class: "org.apache.spark.sql.types.BinaryType", name: "_2")
- root class: "scala.Tuple2"
Sim
  • 13,147
  • 9
  • 66
  • 95
JQ.
  • 678
  • 7
  • 17

1 Answers1

3

BinaryType is a Spark DataType. It maps in Scala/Java to Array[Byte].

Try df.as[(String, Array[Byte])].collect.toMap.

Make sure you've imported your sessions implicits, e.g., import spark.implicits._ so you gain the ability to create Encoder[T] instances implicitly.

Sim
  • 13,147
  • 9
  • 66
  • 95
  • I tried your suggestion, the error is: result.as[(String, Array[Byte])].collect.toMap java.lang.IllegalArgumentException: Unsupported class file major version 57 at org.apache.xbean.asm6.ClassReader.(ClassReader.java:166) – JQ. Mar 11 '20 at 05:21
  • You are likely running the wrong Java version. See https://stackoverflow.com/questions/53583199/pyspark-error-unsupported-class-file-major-version-55 – Sim Mar 12 '20 at 06:11
  • Source: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/BinaryType.html (The data type representing Array[Byte] values. Please use the singleton DataTypes.BinaryType.) – Prabhatika Vij Mar 24 '23 at 09:08