0

Suppose a source table like this:

Year        Long       Lat        Values    
========================================
2015         8          1           12     
2015         8          1           43
2015         8          1           11
...

The destiny table should be like this:

Year        Long       Lat        Field1          Field2        Field3  
========================================================================
2015         8          1           12             43             11
...

¿How can I do something like this in a single query (to use in Google BigQuery) ?

These post could be an approach of what I want: post1 post2

The "problem" is I will use this query in a Python Google Dataflow pipeline, so due to the restriction of read and write using apache beam I would need to do it in a single select (you can obviate the insert statement because I will use an apache beam function for that).

EXTRA

Using this select

select Fecha, Longitud, Latitud,
       ARRAY_AGG(Valor) as vector     
from `poc-cloud-209212.PruebasIoT.CanonicalFrames`
group by Fecha, Longitud, Latitud

Sample1

If I try to use:

select Fecha, Longitud, Latitud,
       ARRAY_AGG(Valor)[offset(0)] as field1,    
       ARRAY_AGG(Valor)[offset(1)] as field2
from `poc-cloud-209212.PruebasIoT.CanonicalFrames`
group by Fecha, Longitud, Latitud

I have the error:

Array index 1 is out of bounds (overflow)

The solution

This is what worked for my use case:

Simple query to understand easier the way you could do it base on the table of this post (which could have more rows, with different values than shows in the schema):

select year, long, lat,
       array_agg(values)[SAFE_OFFSET(0)] as field1,
       array_agg(values)[SAFE_OFFSET(1)] as field2,
       array_agg(values)[SAFE_OFFSET(2)] as field3
from t
group by year, long, lat;
IoT user
  • 1,222
  • 4
  • 22
  • 49

1 Answers1

1

You can use conditional aggregation. But I think the more typical method would be to put the fields in an array:

select year, long, lat,
       array_agg(values)
from t
group by year, long, lat;

Note that the order of the results is not guaranteed. SQL tables represent unordered sets, and your data has no column specifying the ordering.

You can easily expand this to fields:

select year, long, lat,
       array_agg(values)[offset(0)] as field1,
       array_agg(values)[offset(1)] as field2,
       array_agg(values)[offset(2)] as field3
from t
group by year, long, lat;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786