I would like to transpose a DataFrame aggregating values per column. Let me ilustrate it with an example:
Given this DataFrame:
val df = sc.parallelize(Seq(("A","B","C"), ("D", "E", "F"), ("X", "Y", "Z"), ("A", "N", "Z"))).toDF("col1", "col2", "col3")
df.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| A| B| C|
| D| E| F|
| X| Y| Z|
| A| N| Z|
+----+----+----+
The expected output should be something like this:
col1: Array("A", "D", "X")
col2: Array("B", "E", "Y", "N")
col3: Array("C", "F", "Z")
Consider the real DataFrame could contain about hundreds of columns. Is not necessary preserve the order of the columns in the output.
Edit: Consider as well you could find repeated elements in the columns, but just want the unique elements.
I am using Spark 2.0.2 with scala 2.11.
Any suggestion?
Thanks in advance!