0

So I have to search for a bunch of names in a BigQuery table which I collect periodically in another dataset. The dataset is too large at this point containing almost ~60k names and I no longer can do

SELECT * FROM base.table WHERE name LIKE '%name1%' OR name LIKE '%name2%.....

As I tried it using a python script with:

SELECT * FROM base.table WHERE name LIKE({' OR '.join([f'ulv4.full_name LIKE %{name}%' for name in names])})

But the character limit for query exceeds for this many names. I tried looking at solutions like this and other answers to the same question but no answer seems to work for BigQuery Standard SQL. Any help in this regard is highly appreciated.

Hamza
  • 5,373
  • 3
  • 28
  • 43

1 Answers1

2

You should keep the names in another table and then join to it:

SELECT *
FROM base.table t1
WHERE EXISTS (SELECT 1 FROM other.table t2
              WHERE t1.name LIKE CONCAT('%', t2.name, '%'));

Then, any record in the base.table would only match if it contains some substring name from the other table.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Hey it passes the checks for query but upon running it throws error: `Scalar subquery produced more than one element` – Hamza May 25 '21 at 15:21
  • @Hamza Try the updated version using exists logic. – Tim Biegeleisen May 25 '21 at 15:22
  • Updated logic further throws error: `LEFT SEMI JOIN cannot be used without a condition that is an equality of fields from both sides of the join.` – Hamza May 25 '21 at 15:44
  • 1
    I'm stumped. I'll leave this answer up until someone else can give you a working solution. BigQuery is chock full of quirks. – Tim Biegeleisen May 25 '21 at 15:46