Spark DataFrame schema:
In [177]: testtbl.printSchema()
root
|-- Date: long (nullable = true)
|-- Close: double (nullable = true)
|-- Volume: double (nullable = true)
I wish to apply a scalar-valued function a column of testtbl
. Suppose I wish to calculate an average of the 'Close' column. For an rdd I would do something like
rdd.fold(0, lambda x,y: x+y)
But testtbl.Close
is not an rdd,, it is a column object with limited functionality. Rows of testtbl
are rdds, columns are not. So how to apply add
, or a user function to a single column?