There isn't any reasonable* way of doing such thing as Vectors are not native types. Instead they implement UserDefinedTypes
and as such can be processed only indirectly.
If data is narrow you might consider converting to matching strongly typed Dataset
, but it is unlikely to bring any serious improvement (if not decrease performance).
* One could derive highly indirect solution, fore example by:
- Adding unique ID
- Dumping vector to JSON.
- Reading JSON by reserializing to internal
StructType
representation.
- Exploding vector with
pos_explode
(DenseVector
) or zipping indices and value (SparseVector
)
- Self joining by unique and index.
- Aggregate.
Any such thing would be expensive and completely impractical.