1

In BigQuery we have "ARRAY_AGG" function which helps to convert the normal collection to Nested collection. Is there a similar way to build same kind of nested structure collection using BeamSQL?. Something like below query in BeamSQL,

"Select column1, ARRAY_AGG(STRUCT(column2, column3)) from PCOLLECTION Group by Column1"

lourdu rajan
  • 329
  • 1
  • 5
  • 24
  • Google Cloud Dataflow actually has a SQL product in alpha now. Dataflow Cloud SQL is trying to match BigQuery SQL's functionality. Right now ARRAY_AGG in BQ is not supported by Cloud Dataflow SQL though (what is supported is is in https://cloud.google.com/dataflow/docs/reference/sql/). – Rui Wang Jul 02 '19 at 20:01

1 Answers1

1

If I understood your question correctly, you should be able to use ARRAY constructor like "SELECT ARRAY[1, 2, 3] f_arr", this passes:

  @Test
  public void testArrayConstructor() {
    BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(readOnlyTableProvider);
    PCollection<Row> stream =
        BeamSqlRelUtils.toPCollection(pipeline, sqlEnv.parseQuery("SELECT ARRAY[1, 2, 3] f_arr"));
    PAssert.that(stream)
        .containsInAnyOrder(
            Row.withSchema(Schema.builder().addArrayField("f_arr", FieldType.INT32).build())
                .addValue(Arrays.asList(1, 2, 3))
                .build());
    pipeline.run().waitUntilFinish(Duration.standardMinutes(2));
  }

See also:

Anton
  • 2,431
  • 10
  • 20
  • Thanks for your reply. I guess your sample code and reference will support with Nested row in BigQuery, but not with 'Nested row with repeated'. Anyway let me explore it with these example/reference and let you know my feedback. I am looking for Nested Row with Repeated. – lourdu rajan Jun 06 '19 at 07:45