I am writing a Spark project using Scala in which I need to make some calculations from "demo" datasets. I am using databricks platform.
I need to pass the 2nd column of my Dataframe (trainingCoordDataFrame) into a list. The type of the list must be List[Int].
The dataframe is as shown bellow:
> +---+---+---+---+
> |_c0|_c1|_c2|_c3|
> +---+---+---+---+
> |1 |0 |0 |a |
> |11 |9 |1 |a |
> |12 |2 |7 |c |
> |13 |2 |9 |c |
> |14 |2 |4 |b |
> |15 |1 |3 |c |
> |16 |4 |6 |c |
> |17 |3 |5 |c |
> |18 |5 |3 |a |
> |2 |0 |1 |a |
> |20 |8 |9 |c |
> |3 |1 |0 |b |
> |4 |3 |4 |b |
> |5 |8 |7 |b |
> |6 |4 |9 |b |
> |7 |2 |5 |a |
> |8 |1 |9 |a |
> |9 |3 |6 |a |
> +---+---+---+---+
I am trying to create the list I want using the following command:
val trainingCoordList = trainingCoordDataFrame.select("_c1").collect().map(each => (each.getAs[Int]("_c1"))).toList
The message from the compiler is this:
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
Note that the procedure is :
1) Upload the dataset from local PC to databricks (so no standard data can be used).
val mainDataFrame = spark.read.format("csv").option("header", "false").load("FileStore/tables/First_Spacial_Dataset_ByAris.csv")
2) Create dataframe. ( Step one: Split the main Dataframe randomly. Step two : Remove the unnecessary columns)
val Array(trainingDataFrame,testingDataFrame) = mainDataFrame.randomSplit(Array(0.8,0.2)) //step one
val trainingCoordDataFrame = trainingDataFrame.drop("_c0", "_c3") //step two
3) Create list. <- Here is the false command.
What is the correct way to reach the result I want?