For a given table like
+--+--+
| A| B|
+--+--+
|aa|bb|
|cc|dd|
+--+--+
I want to get a dataframe like:
+---+---+
|._1|._2|
+---+---+
|aa | A |
|bb | B |
|cc | A |
|dd | B |
+---+---+
using Apache Spark and Scala. So basically I want tuples that have the original values at index 0 and the column name at index 1. This should work for any arbitrary schema. That means that the amount of columns is not known beforehands and as far as I understand I therefore cannot cast to datasets. This is how I tried to solve it:
val df = spark.read
.option("header", "true")
.option("sep",";")
.csv(path + "/tpch_nation.csv")
val cells = df.flatMap(tuple => {
tuple.toSeq.asInstanceOf[Seq[String]].zip(df.columns.toList)
})
cells.show()
However, this produces an java.lang.NullPointerException
inside the flatMap function. I am quite puzzled which object points to Null, and how I could solve the problem.