There is no groupByKey
method that takes Column
as an argument. There are methods which take functions, either:
def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): KeyValueGroupedDataset[K, T]
or
def groupByKey[K](func: (T) ⇒ K)(implicit arg0: Encoder[K]): KeyValueGroupedDataset[K, T]
Compared to groupBy
that takes Columns
:
def groupBy(cols: Column*): RelationalGroupedDataset
or String
def groupBy(col1: String, cols: String*): RelationalGroupedDataset
the difference should be obvious - the first two return KeyValueGroupedDataset
(intended for processing with "functional", "strongly typed API, like mapGroups
or reduceGroups), while the later methods return
RelationalGroupedDataset` (intended for processing with SQL-like API).
In general see: