1

I try to implement such code:

    StructType dataStruct = new StructType()
            .add("items", DataTypes.createArrayType(DataTypes.StringType, false), false);
    ExpressionEncoder<Row> encoder = RowEncoder.apply(dataStruct);

    Dataset<Row> arrayItems = transactions.map((MapFunction<Row, Row>) row -> {
        List<String> items = new LinkedList<>();
        for (int i = 1; i <= 12; i++) {
            if (row.getString(i) != null)
                items.add(row.getString(i));
        }
        System.out.println(items);
        return RowFactory.create(items.toArray());
    }, encoder);

to convert dataset with such schema:

|user<String>|item1<String>|item2<String>|item3<String>|...|item12<String>|

to dataset with such schema:

|item<String[]>|

but i take following exception: java.lang.RuntimeException: java.lang.String is not a valid external type for schema of array

I don't understand why RowFactory takes as argument String, not String[]? Can somebody help me, what I should do in this situation?

Data example:

user|item1|item2|item3|item4|item5|item6|item7|item8|item9|item10|item11|item12
Bob|01W|01J|01W|01J|01W|01J|01W|01J|01W|01J|null|null
John|03T|018T|003H|A44I|03T|null|003H|A44I|03T|018T|003H|null
Bill|CMZI|UDAG|01W|null|null|01J|018T|003H|A44I|018T|003H|A44I
Jack Loki
  • 95
  • 6

1 Answers1

2

This happens because varargs in Java are just syntactic sugar and

Object ... values

is equivalent to

Object[] values

so

return RowFactory.create(items.toArray());

will expand the array. You'll need a nested structure:

Object[] rowItems =  {items.toArray()};
RowFactory.create(rowItems));

Further reading Can I pass an array as arguments to a method with variable arguments in Java?

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
  • @JackLoki I am glad I could help. This is a nice question. Would you consider [upvote](https://stackoverflow.com/help/why-vote)? :) – Alper t. Turker Jan 25 '18 at 11:38