I try to implement such code:
StructType dataStruct = new StructType()
.add("items", DataTypes.createArrayType(DataTypes.StringType, false), false);
ExpressionEncoder<Row> encoder = RowEncoder.apply(dataStruct);
Dataset<Row> arrayItems = transactions.map((MapFunction<Row, Row>) row -> {
List<String> items = new LinkedList<>();
for (int i = 1; i <= 12; i++) {
if (row.getString(i) != null)
items.add(row.getString(i));
}
System.out.println(items);
return RowFactory.create(items.toArray());
}, encoder);
to convert dataset with such schema:
|user<String>|item1<String>|item2<String>|item3<String>|...|item12<String>|
to dataset with such schema:
|item<String[]>|
but i take following exception: java.lang.RuntimeException: java.lang.String is not a valid external type for schema of array
I don't understand why RowFactory takes as argument String, not String[]? Can somebody help me, what I should do in this situation?
Data example:
user|item1|item2|item3|item4|item5|item6|item7|item8|item9|item10|item11|item12
Bob|01W|01J|01W|01J|01W|01J|01W|01J|01W|01J|null|null
John|03T|018T|003H|A44I|03T|null|003H|A44I|03T|018T|003H|null
Bill|CMZI|UDAG|01W|null|null|01J|018T|003H|A44I|018T|003H|A44I