0

The task I am trying to achieve is the following: I have a table with two columns : - u0 and u1. The values in the two columns sometimes are repeated, meaning that one value would appear in both u0 and u1 .

          u0                       u1    
1         A                         B 
2         B                         C
3         D                         B
4         B                         D
5         F                         E

If the table has one row: u0=A, u1=B and another row u0=B, u1=C then we can conclude that A,B and C all belong to the same user. The idea is to build a table with a column where every row contains a list of U entries that belong to one user in a comma separated string.

Desiered output:

       IDs
u0    A,B,C,D
u1    E,F,G

The way I have approached the task is with the following :

select 
  distinct 
    stuff((  
        select ',' + u0
        from [db1]
        where u0 like u1 
        for xml path('')
 ),1,1,'')as uids 
from [db1]

But with this query in BigQuery I get the error :

Error: Encountered " "FOR" "for "" at line 7, column 9. Was expecting: ")" ...

I have tried with COALESCE as well , but with no success

SELECT uids = COALESCE(u1 + ',','') + u0
FROM [db1]
WHERE u0 in u1

and got this error message back :

Error: Encountered " "IN" "in "" at line 3, column 12. Was expecting:

Thank you in advance.

Glaucon
  • 71
  • 3
  • 9
  • Your table reference is clearly legacy BigQuery, so I removed the inappropriate databases. – Gordon Linoff Feb 23 '17 at 15:06
  • Current syntax: [String_agg](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg)... Old syntax: [Group_concat](https://cloud.google.com/bigquery/docs/reference/legacy-sql#group_concat) – JohnHC Feb 23 '17 at 15:09
  • you should use GROUP_CONCAT sql function check this link http://stackoverflow.com/questions/42418641/split-row-results-to-column/42419232#42419232 – Keval Pithva Feb 23 '17 at 15:11
  • Thank you all for the swift replies. I have one question : Is there some similar function to group_concat and string_agg for the cases where the data is not VARCHAR but integer ? Thank you – Glaucon Feb 23 '17 at 15:26
  • You can use `ARRAY_AGG`, or you can use `STRING_AGG(CAST(x AS STRING))`. In your sample output above, how would you end up with `G` in the `u1` column? – Elliott Brossard Feb 23 '17 at 15:29

1 Answers1

2

I don't think that I entirely understand the question, but if you just want to get the string concatenation of distinct u0 values that are in u1, you can do something like this:

#standardSQL
SELECT 
  STRING_AGG(DISTINCT u0)
FROM T
WHERE u0 IN (SELECT u1 FROM T);

As a self-contained example:

#standardSQL
WITH T AS (
  SELECT *
  FROM UNNEST(ARRAY<STRUCT<u0 STRING, u1 STRING>>[
    ('A', 'B'),
    ('B', 'C'),
    ('D', 'B'),
    ('B', 'D'),
    ('F', 'E')
  ])
)
SELECT 
  STRING_AGG(DISTINCT u0)
FROM T
WHERE u0 IN (SELECT u1 FROM T);
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99
  • Thank you for the answer Elliott. When i tried the first proposed solution I just got back a one column table containing the values of both u0 and u1 but not in a comma separated string , but each observation in a separate field. The second example is not scalable given that I have 3 million observations. – Glaucon Feb 24 '17 at 08:25