Which is the preferred way to implement a class based on the Function1/MapFunction interfaces in Spark 2.3, where the class will mutate the individual rows schema? Ultimately every row's schema might become different depending on the result of different look-ups.
Something like:
public class XyzProcessor implements Function1<Row, Row> {
...
public Row call(Row row) throws Exception {
/// The `row` schema will be changed here...
return row;
}
...
The .map method of the Dataset will be called as:
ExpressionEncoder<Row> rowEncoder = RowEncoder.apply(foo.schema());
dataset.map(new XyzProcessor(), rowEncoder);
The "problem" is that the XyzProcessor will alter the schema by adding columns to the row thus rendering the rowEncoder in a faulty state schema wise. How is the preferred way to deal with this?
Is this the right way to accomplish Dataset modifications?