Updating bigquery by using a mapping tables using regular expressions

Question

Based on this questions and the perfect answer given there, I now need to use this approach to UPDATE a table column.

Table1 (data table)

textWithFoundItemInIt         | updated_foundItem
-------------------------------------------
hallo Adam                    |  nothing in here yet
Bert says hello               | ....
Bert says byebye              | 
Want to find "Caesar"bdjehg   |
Want to find "Caesar"bdjehg   |
Want to find "Caesar"again    |

Table2 (mapping table)

mappingItem
------------
Adam
Bert
Caesar
Bert
Caesar
Adam

Expected result UPDATED

textWithFoundItemInIt         | updated_foundItem
--------------------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Bert says byebye              |  Bert
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"again    |  Caesar
Want to find "Caesar"again and also Bert    | Caesar [or Bert]
Want to find "CaesarCaesar"again and again | Caesar

My query: Based on this answer, I came up with this Update-Statement:

UPDATE Table1
SET updated_foundItem = foundItem
FROM 
(select DISTINCT textWithFoundItemInIt, 
  regexp_extract(textWithFoundItemInIt, r'(?i)' || mappingItems) AS foundItem
from table1, (select string_agg(mappingItem, '|') mappingItems from table2))
WHERE ... ???

But I have no clue how the correct WHERE-statement should look like. Unfortunately I cannot test many queries in bigQuery, as my dataset is very large and even making a sample for testing is currently out of scope.

Thanks a lot for your help.

score 1 · Answer 1 · answered Dec 05 '22 at 13:50

once again thank you. I would like to show you the actual outcome, when I am using your query (in phpMyAdmin, and most likely as it will be in BQ). So it allocates every found item in the mapping table, to each row in the data table.

--------------------------------------------------------------------------------
|| hallo Adam                    |  Adam
|| hallo Adam                    |  Bert
|| hallo Adam                    |  Cesar
|| Bert says hello               |  Adam
|| Bert says hello               |  Bert
|| Bert says hello               |  Cesar
|| Bert says byebye              |  Adam
|| Bert says byebye              |  Bert
|| Bert says byebye              |  Cesar
.....

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Community, Dec 08 '22 at 08:34

score 0 · Accepted Answer · answered Nov 25 '22 at 13:15

I will provide the answer to the SELECT problem. A translation to the UPDATE statement can be applied directly from the answer.

Solution 1: One record per match.

If there is more than one match, the results will have more than one row matching field in table one with all matched fields in table two. One per row.

WITH Table_One AS (
SELECT 'hallo Adam Caesar' AS textWithFoundItemInIt UNION ALL
SELECT 'Jim' UNION ALL
SELECT 'Bert says hello' UNION ALL
SELECT 'Bert says byebye' UNION ALL
SELECT 'Want to find "Caesar"bdjehg' UNION ALL
SELECT 'Want to find "Caesar"bdjehg' UNION ALL
SELECT 'Want to find "Caesar"again'
),

Table_Two AS (

SELECT 'Adam' AS mappingItem UNION ALL
SELECT 'Bert' UNION ALL
SELECT 'Caesar'

)


SELECT 
Table_One.textWithFoundItemInIt, 
Table_Two.mappingItem
FROM Table_One
INNER JOIN Table_Two ON
Table_One.textWithFoundItemInIt Like CONCAT('%', Table_Two.mappingItem, '%');

Solution 2: Array struct for all matches.

So, one single row per value in table one and an array with all matches found in table two.

WITH Table_One AS (
SELECT 'hallo Adam Caesar' AS textWithFoundItemInIt UNION ALL
SELECT 'Jim' UNION ALL
SELECT 'Bert says hello' UNION ALL
SELECT 'Bert says byebye' UNION ALL
SELECT 'Want to find "Caesar"bdjehg' UNION ALL
SELECT 'Want to find "Caesar"bdjehg' UNION ALL
SELECT 'Want to find "Caesar"again'
),

Table_Two AS (

SELECT 'Adam' AS mappingItem UNION ALL
SELECT 'Bert' UNION ALL
SELECT 'Caesar'

)

SELECT 
Table_One.textWithFoundItemInIt, 
ARRAY_AGG(Table_Two.mappingItem) AS mappingItem FROM Table_One
INNER JOIN Table_Two ON
Table_One.textWithFoundItemInIt Like CONCAT('%', Table_Two.mappingItem, '%')
GROUP BY Table_One.textWithFoundItemInIt;

Since you don't specify how the output should look like (several rows or one single row per key in table one), you need to choose the best solution for your use case and rewrite the select to perform the update.

Hello, thank you for this answer. Unfortunately that one is not working, as it selects every entry and allocates all foundItems to it. So it's not really doing the mapping. The solution that I have linked (https://stackoverflow.com/questions/68998617/bigquery-mapping-tables-using-like-with-duplicate-rows) in my question works well. It's only the question about the WHERE clause. — Frank, Nov 27 '22 at 11:46
Hello, I think my answer is ok if you take into account the sample data and the expected output. I have even provided two different outputs to collapse or not the output using arrays. If your question was not complete or clear enough, please change it so that we can all clearly understand your expected inputs and outputs. — David Morales, Nov 27 '22 at 13:04
Hallo David, once again thank you. I have tried it again and it works as expected. Thanks a lot :) --- For MySQL version, simply change "ARRAY_AGG" to "group_concat". — Frank, Dec 21 '22 at 11:22

Updating bigquery by using a mapping tables using regular expressions

2 Answers2