2

I have a dataset with various longitude and latitude datapoints. I would like to develop a centroid or "average" of those longitude and latitude coordinates in bigquery at a specific level of granularity.

Example of Current Data:

ID LONG LAT
101 -71.23403 42.01979
101 -91.469621 44.867211
102 78.8952716 38.4022661
102 80.8518668 35.3152386

Desired Output (output centroid is made up)

ID CENTROID_LONG CENTROID_LAT
101 -71.23403 42.01979
102 -91.469621 44.867211

Where the values above are aggregated to a centroid lat and long number.

lemon
  • 14,875
  • 6
  • 18
  • 38
Jaskeil
  • 1,044
  • 12
  • 33

1 Answers1

2

Consider below

select id, st_union_agg(st_geogpoint(long, lat)) points,
  st_centroid(st_union_agg(st_geogpoint(long, lat))) centroid
from your_table
group by id    

if applied to sample data in your question - output is

enter image description here

which is visualized as

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • This works! I wanted to ask, does average also work? I tried it out and got a reasonable "centroid" by just averaging lat/long of both IDs – Jaskeil Sep 08 '22 at 21:14
  • I think even for two points - average will not be the same as when using respective geo function. but for more then two - for sure will be different - very different! – Mikhail Berlyant Sep 08 '22 at 22:39
  • Thank you, I see the values are listed as POINT(-91.4770186504064 40.1330453849171), is there any way I can extract these two values and place it into two different columns which would indicate the lat and the long. – Jaskeil Sep 09 '22 at 15:15
  • I will be using the values to calculate linear distance and also potentially placing it on a map. – Jaskeil Sep 09 '22 at 15:46
  • 1
    to get lat and long in separate columns - you can use [`ST_X(centroid)`](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_x) and [`ST_Y(centroid)`](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_y) – Mikhail Berlyant Sep 09 '22 at 16:03
  • 1
    Great answer, note that there is `ST_CENTROID_AGG` function, that avoids constly `ST_UNION`: `select id, st_centroid_agg(st_geogpoint(long, lat)) centroid from your_table group by id ` – Michael Entin Sep 10 '22 at 06:43