0

My employees have one ID for each different application. I want to find all the IDs belonging to the same employee.

As input I have a table with 2 columns of IDs, ID1 and ID2 that I know belong to the same employee. But they are not unique and in the table I can find a row below with ID2,ID3 and then my employee would have all three IDs ID1,ID2,ID3.

An example is below for clarity:

enter image description here

I need to do this query in SQL and I think multiple iterations are needed to group the IDs.

Note: this is to follow up on so called dead question closed back in July - that someone asked but never got any response/answer

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230

1 Answers1

1

Below is for BigQuery Standard SQL

Usually this type of logic is implemented using so called recursive CTEs, but BigQuery does not support such!

Luckily, recently introduced scripting allows to implement this in BigQuery - see example below

DECLARE rows_count, run_away_stop INT64 DEFAULT 0;

CREATE TEMP TABLE ttt AS WITH input AS (
  SELECT 11 EmpID1, 12 EmpID2 UNION ALL
  SELECT 13, 14 UNION ALL
  SELECT 11, 15 UNION ALL
  SELECT 14, 12 UNION ALL
  SELECT 16, 20 UNION ALL
  SELECT 17, 18 UNION ALL
  SELECT 18, 22 UNION ALL
  SELECT 22, 24 UNION ALL
  SELECT 88, 89 UNION ALL
  SELECT 90, 92 UNION ALL
  SELECT 91, 93 UNION ALL
  SELECT 90, 89 
)
SELECT ARRAY(SELECT val FROM UNNEST([EmpID1, EmpID2]) val ORDER BY val ) arr FROM input;

LOOP
  SET (run_away_stop, rows_count) = (SELECT AS STRUCT run_away_stop + 1, COUNT(1) FROM ttt);

  CREATE OR REPLACE TEMP TABLE ttt AS
  SELECT ANY_VALUE(arr) arr FROM (
    SELECT ARRAY(SELECT DISTINCT val FROM UNNEST(arr) val ORDER BY val) arr
    FROM (
      SELECT ANY_VALUE(arr1) arr1, ARRAY_CONCAT_AGG(arr) arr    
      FROM (
        SELECT t1.arr arr1, t2.arr arr2, ARRAY(SELECT DISTINCT val FROM UNNEST(ARRAY_CONCAT( t1.arr, t2.arr)) val ORDER BY val) arr 
        FROM ttt t1, ttt t2 
        WHERE (SELECT COUNT(1) FROM UNNEST(t1.arr) val JOIN UNNEST(t2.arr) val USING(val)) > 0
      ) GROUP BY FORMAT('%t', arr1)
    )
  ) GROUP BY FORMAT('%t', arr);

  IF (rows_count = (SELECT COUNT(1) FROM ttt) AND run_away_stop > 1) OR run_away_stop > 10 THEN BREAK; END IF;
END LOOP;

SELECT FORMAT('%t', arr) ListOfIDs FROM ttt;  

with result

Row ListOfIDs    
1   [91, 93]     
2   [88, 89, 90, 92]     
3   [17, 18, 22, 24]     
4   [11, 12, 13, 14, 15]     
5   [16, 20]     
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230