I am using a third party package for spark that utilizes a "PointFeature" object. I am trying to take a csv file and put elements form each row into an Array of these PointFeature objects.
The PointFeature constructor for my implementation looks like this:
Feature(Point( _c1, _c2), _c3)
where _c1, _c2, and _c3 are the columns of my csv and represent doubles.
Here is my current attempt:
val points: Array[PointFeature[Double]] = for{
line <- sc.textFile("file.csv")
point <- Feature(Point(line._c1,line._c2),line._c3)
} yield point
My error shows up when referencing the columns
<console>:36: error: value _c1 is not a member of String
point <- Feature(Point(line._c1,line._c2),line._c3.toDouble)
^
<console>:36: error: value _c2 is not a member of String
point <- Feature(Point(line._c1,line._c2),line._c3.toDouble)
^
<console>:36: error: value _c3 is not a member of String
point <- Feature(Point(line._c1,line._c2),line._c3.toDouble)
^
This is obviously because I'm referencing a String as if it were an element of a DataFrame. I'm wondering if there is a way to work with DataFrames in this loop format, or a way to split each line into a List of doubles. Maybe I need an RDD?
Also, I'm not certain that this will yield an Array. Actually, I suspect it will return an RDD...
I am using Spark 2.1.0 on Amazon EMR
Here are some other Questions I have drawn from:
How to read csv file into an Array of arrays in scala