1

I have two tables and want to update table1 (original data table) with data found in table2 (mapping table) using a LIKE statement. However, I always get the error message for all my different tries:

Query error: UPDATE/MERGE must match at most one source row for each target row

Table1 (data table)

textWithFoundItemInIt         | foundItem
---------------------------------
hallo Adam                    |  
Bert says hello               | 
Want to find "Caesar"bdjehg   |

Table2 (mapping table)

mappingItem
------------
Adam
Bert
Caesar

Expected result

textWithFoundItemInIt         | foundItem
---------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Want to find "Caesar"bdjehg   |  Caesar

Queries:

UPDATE `table1`
SET foundItem= mt.mappingItem
    FROM `mappingTable` mt
    WHERE textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%');


UPDATE `table1`
SET foundItem= mt.mappingItem
    FROM `mappingTable` mt
     WHERE INSTR(textWithFoundItemInIt , mt.mappingItem) >1;


UPDATE `table1`
SET foundItem = (SELECT mt.mappingItem FROM `table2` AS mt
WHERE textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%')
)
WHERE TRUE; 


UPDATE `table1`
SET foundItem= mt.mappingItem
FROM `table1`
inner join  `table2` mt on textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%');

I have also removed all duplicate values from both table 1 and table 2, but still same error message. I also tried to use a join statement, but I got this err msg: "Alias table1 in the FROM clause was already defined as the UPDATE target"

I found these similar questions in SO and tried to use their approaches:

Unfortunately they were not helpful to solve my problem. So I think this is not a duplicate question.

Thank you very much for your thoughts.


FOLLOW UP QUESTION

I am referring to the solution posted by @Jon. Once again thank you for your help. However, after testing it with different data, there is still the issue that it does not work if there are duplicates in 'table1'. Of course this problem comes from the 'GROUP BY' statement - and w/o this, the UPDATE query does not work, resulting in the error message stated in my original question. It doesn't work either, if I GROUP every value.

However, there can be duplicates in my 'table1' (Data) and my mapping table 'table2'. So to make it very precise, this is my goal:

Table1 (data table)

textWithFoundItemInIt         | foundItem
-------------------------------------------
hallo Adam                    |  
Bert says hello               | 
Bert says byebye              | 
Want to find "Caesar"bdjehg   |
Want to find "Caesar"bdjehg   |
Want to find "Caesar"again    |
Want to find "CaesarCaesar"again and again | <== This is no problem, just finding one Caesar is enough 

Table2 (mapping table)

mappingItem
------------
Adam
Bert
Caesar
Bert
Caesar
Adam

Expected result

textWithFoundItemInIt         | foundItem
--------------------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Bert says byebye              |  Bert
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"again    |  Caesar
Want to find "CaesarCaesar"again and again | Caesar

It doesn't matter which Adam from Table2 is found and inserted into Table1, they will be the same. So it is even okay if the first Adam will be over written by the second Adam, or if the query just stops to search any further once one Adam is found.

If I execute Jon's 'SELECT' query, it would result in:

textWithFoundItemInIt         | foundItem
--------------------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Bert says byebye              |  Bert
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"again    |  Caesar
Want to find "CaesarCaesar"again and again | Caesar

It (correctly) omits the second "Want to find "Caesar"again", but that's unfortunately not what I need.

If it is easier, it would also be okay that in cases that two names are found in one row

textWithFoundItemInIt         | foundItem
---------------------------------------------
hallo Adam and Bert           |  Adam, Bert 
Bert says hello to Caesar     |  Bert, Caesar

or

textWithFoundItemInIt         | foundItem1      | foundItem2
---------------------------------------------------------------
hallo Adam and Bert           |  Adam           | Bert 
Bert says hello to Caesar     |  Bert           | Caesar

I hope this helps to understand my issue. In easy words: "It's just a mapping with multiple equal rows" ;-)

Thanks a lot :)

Frank
  • 55
  • 1
  • 8

1 Answers1

1

Your logic does not guard against this case:

mappingItem
-----------
item1
item12

Since pattern %item1% will match both item1 and item12. There are a bunch of ways to avoid that, which depend on how you wish to handle these issues in poorly structured data. But this is the reason.

You can look for problems with something like this:

SELECT table1.textWithFoundItemInIt
     , COUNT(*)
  FROM table1
  JOIN table2
    ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
 GROUP BY table1.textWithFoundItemInIt 
HAVING COUNT(*) > 1

Once you decide how to handle these cases, you should be able to choose which, of the matching choices, to use in the UPDATE.

Basically, make sure the logic limits the list of values to assign (per table1 row) to one (1) value.

Here's one approach. I'm not sure bigquery supports this specific form. But it shows a logical approach.

Review the data, noting we have a case where more than one mappingItem matches a table1 row:

SELECT table1.textWithFoundItemInIt
     , COUNT(*)
     , MIN(table2.mappingItem) AS theItem1
     , MAX(table2.mappingItem) AS theItem2
  FROM table1
  JOIN table2
    ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
 GROUP BY table1.textWithFoundItemInIt 
HAVING COUNT(*) > 1
;

+-----------------------+----------+----------+----------+
| textWithFoundItemInIt | COUNT(*) | theItem1 | theItem2 |
+-----------------------+----------+----------+----------+
| Item12 is a problem   |        2 | item1    | item12   |
+-----------------------+----------+----------+----------+

Now adjust the UPDATE to pick the MIN(mappingItem) per table1 row when assigning the new value:

UPDATE table1
  JOIN ( SELECT textWithFoundItemInIt
              , MIN(mappingItem) AS mappingItem
           FROM table1
           JOIN table2
             ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
          GROUP BY table1.textWithFoundItemInIt 
       ) mt
    ON table1.textWithFoundItemInIt = mt.textWithFoundItemInIt 
   SET foundItem = mt.mappingItem
;

Review the result:

SELECT * FROM table1;

+----------------------------+-----------+
| textWithFoundItemInIt      | foundItem |
+----------------------------+-----------+
| hallo Item1                | item1     |
| Item2 says hello           | item2     |
| Item12 is a problem        | item1     |
| Want to find "Item3"bdjehg | item3     |
+----------------------------+-----------+

Note: This updates all target rows, even the problem rows, based on the original request. This can be adjusted to only touch those rows which don't already have a foundItem set, WHERE foundItem IS NULL.

Jon Armstrong
  • 4,654
  • 2
  • 12
  • 14
  • Thank you for your reply. Sorry, my logic in this description was indeed not sufficient. The mapping items are different company names, which might not result in different matches. – Frank Aug 24 '21 at 12:54
  • @Frank In that case, if you think the data shouldn't result in the error, produce / show a minimal, but complete, test case which produces the error, with precise data. Very often, we just make assumptions that turn out to be false. We all do this. One suggestion is to test your assumption. Try the diagnostic query I showed to find unexpected matches. – Jon Armstrong Aug 24 '21 at 12:57
  • I tried your query and it results in a error: Star expansion expression references column string_field_0 which is neither grouped nor aggregated at [17:8], as there are more columns in my data table. I have my data set ready here: https://www.filemail.com/d/tkbqrwtanfugbdu – Frank Aug 24 '21 at 13:05
  • @Frank Sorry. Adjust the SQL as needed. `SELECT` only the columns of interest, like: `SELECT table1.textWithFoundItemInIt, COUNT(*)` ... Your data contains only one table of detail. Update your description to indicate which table is which and which columns are which. – Jon Armstrong Aug 24 '21 at 13:08
  • No worries ;-) I have adjusted the SELECT statement (my mistake, sry) and it works now. ... By the way: I am using only test data so far. My real data set at the end will have ~10m entries and much more text in the "textWithFoundItemInIt" column. – Frank Aug 24 '21 at 13:14
  • @Frank Did you find anything helpful using that query? I also noticed some possible problems in your csv test data, where the row delimiter `;;` may be confused by a column with missing data: `value1;;value3;etc...;;` – Jon Armstrong Aug 24 '21 at 13:18
  • Yes, it is very helpful! So far it works very well. Thank you very much. The row delimiter should not be a problem for my real data, as this data is directly crawled form bigQuery and no import/export needed :) But I see that it results only in 4 rows, but the original data (table1) has 78 rows. Thus it is must be one mappingItem for each row. So i adjusted the code to HAVING COUNT(*) >= 1 to find all of them :) Now I think I am cross checking it manually to see if everything worked out. – Frank Aug 24 '21 at 13:20
  • 1
    @Frank The SQL intentionally limits the result to just those having more than one match. The others aren't causing problems. You can mark this answer `helpful` if you wish and I will be able to update the answer with SQL to resolve the issue, once you decide how you wish to handle these cases. – Jon Armstrong Aug 24 '21 at 13:21
  • 1
    yes sure, I voted it as useful, however I do not have enough credits yet to do so ;-) But I have accepted it :) ... just wondering: Why are there rows with HAVING COUNT(*) > 1. Are there still duplicates? – Frank Aug 24 '21 at 13:24
  • Sorry for the next comment, but I am now struggling to put the code into a UPDATE statement. It returns: "Query error: Scalar subquery produced more than one element at [20:1]". I think I haven't understood your query enough. I assume the HAVING COUNT is needed to avoid the problem with duplicates. But why are there duplicates? There are now duplicate rows in the original data. THE GROUP BY is needed to be able to have a COUNT, right? Sorry for my further questions. I thought my task would be much easier as it now turned out. – Frank Aug 24 '21 at 14:02
  • @Frank Basically, you will want to limit the found mappings (from table2) to just one (per table1 row), so the assignment succeeds. Did you decide which mapping (of several) you would use? You could pick the MAX (highest alphabetically) or the MAX(length(x)). There are many ways. You can create a new question limited to just this problem, assuming you know which mapping you want. Create a tiny test case with just enough data to show the problem. When I have a moment, I will propose one approach. You might need to adjust it slightly for bigquery. – Jon Armstrong Aug 24 '21 at 14:19
  • ***Thank you very much for updating your answer!*** I am trying to get it running, but I get an Syntax error: Unexpected keyword JOIN. The SELECT-Statement works very well. I have tried this part. But it does not work together with the UPDATE. I found the reason for that. ***BQ is expecting a FROM and WHERE clause. So I have adjusted the statement and it now works very well.*** (At least as far I can check manually) ;-) Now I am trying to get it running on my original data and hope it works out. Once again, Thank you @Jon-Armstrong for your help. – Frank Aug 25 '21 at 11:11