I have a Spark dataframe containing vector data in one of its columns like the first column shown below
+--------------------+-----+-----------+
| features|Label|OutputLabel|
+--------------------+-----+-----------+
|(1133,[33,296,107...| 0| 0.0|
|(1133,[19,1045,10...| 0| 0.0|
|(1133,[9,398,1075...| 0| 0.0|
|(1133,[0,927,1074...| 0| 0.0|
|(1133,[41,223,107...| 0| 0.0|
|(1133,[70,285,108...| 0| 0.0|
|(1133,[4,212,1074...| 0| 0.0|
|(1133,[25,261,107...| 0| 0.0|
|(1133,[0,258,1074...| 0| 0.0|
|(1133,[2,219,1074...| 0| 0.0|
|(1133,[8,720,1074...| 0| 0.0|
|(1133,[2,260,1074...| 0| 0.0|
|(1133,[54,348,107...| 0| 0.0|
|(1133,[167,859,10...| 0| 0.0|
|(1133,[1,291,1074...| 0| 0.0|
|(1133,[1,211,1074...| 0| 0.0|
|(1133,[23,216,107...| 0| 0.0|
|(1133,[126,209,11...| 0| 0.0|
|(1133,[70,285,108...| 0| 0.0|
|(1133,[96,417,107...| 0| 0.0|
+--------------------+-----+-----------+
Please see below the schema of this dataframe
root
|-- features: vector (nullable = true)
|-- Label: integer (nullable = true)
|-- OutputLabel: double (nullable = true)
Question 1 : I need to split the first column data into two columns so that the integer data should come in one column and the array data should come in other column. Not sure how to do it in Spark / Scala ? Any pointers on this will be helpful.
When I tried to write this dataframe as csv file, I got the below error
Exception in thread "main" java.lang.UnsupportedOperationException: CSV data source does not support struct,values:array> data type.
Question 2 : I understand that even this dataframe cannot be written as a text file since it will write only one column into output file and it should not be of type Struct. So is it possible to write this dataframe after splitting the first column into two separate columns ? The second column data will be of array data type. Can we write into output file in that way ?
Question 3 : By any chance can we write the array data alone into a csv file ?