0

I want to write a big query to fetch the value in a column from multiple tables in a given dataset. But the column name in each table is different like colA, colB, colC and so on. How to accomplish this?

I have many tables in my dataset where one of the column contains web URL. However this column name is different in each table. I want to process all the URL's of all the tables.

I checked in this link How to combine multiple tables that vary slightly in columns. However it talks about limited number of column name variation and limited number of tables.

I know

SELECT
column_name
FROM
`bq-project.bq-dataset.INFORMATION_SCHEMA.COLUMNS`
group by 1 

will give distinct column, but not sure how to proceed

Praveen
  • 31
  • 8

2 Answers2

0

You may create a view to translate column names.

CREATE VIEW my_dataset.aggregated_tables AS
SELECT * EXCEPT (colA), colA as url FROM table_a
UNION
SELECT * EXCEPT (colB), colB as url FROM table_b
UNION
SELECT * EXCEPT (colC), colC as url FROM table_c;
Yun Zhang
  • 5,185
  • 2
  • 10
  • 29
0

For fun, discovering what column has an URL, using JS UDFs:

CREATE TEMP FUNCTION urls(x STRING)
RETURNS STRING
LANGUAGE js AS r"""
  function isURL(str) {
    // https://stackoverflow.com/a/49185442/132438
    return /^(?:\w+:)?\/\/([^\s\.]+\.\S{2}|localhost[\:?\d]*)\S*$/.test(str); 
  }

  obj = JSON.parse(x);
  for (var key in obj){
    if(isURL(obj[key])) return(obj[key]);
  }
""";

WITH table_a AS (SELECT 'https://google.com/' aa)
,table_b AS (SELECT 'http://medium.com/@hoffa' ba, 'noturl' bb)
,table_c AS (SELECT 'bigquery' ca, 'noturl' cb, 'https://twitter.com/felipehoffa' cc)

SELECT urls(x) url
FROM (
  SELECT TO_JSON_STRING(t) x FROM table_a t
  UNION ALL
  SELECT TO_JSON_STRING(t) FROM table_b t
  UNION ALL
  SELECT TO_JSON_STRING(t) FROM table_c t
)

enter image description here

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325