0

Please can you help me, I am getting this exception

      Lock wait timeout exceeded; try restarting transaction

and I'm running this query.

    UPDATE Table1 INNER JOIN Table2
    SET Table1.xCoord = Table2.Longitude, Table1.yCoord = Table2.Latitude,
    Table1.location2 = CONCAT(Table2.Street, ', ', Table2.Post_Town, ', ', 
    Table2.County, ', ', Table2.Postcode, ', UK ')
    WHERE (Table1.location2 = '' OR Table1.location2 IS NULL) AND 
    (Table1.location1 LIKE CONCAT('%', Table2.Street, '%'))
    AND (Table2.location1 LIKE CONCAT('%', Table2.Post_Town, '%'));

Table2 is 30 million rows and table 1 is 4000 rows (likely to increase). I have an index on 1. street 2. town. In reply to Rick Ive given example tables.

  • Table 1

| location 1 || location 2 || postcode

|| flat 30 234 harddrive street morganvale ||(new string add)||

  • Table 2

|| house no || street || town || postcode || xcoord || ycoord ||

|| 234 || harddrive street|| morganvale|| mr12 123 || 5.1234 || 24.2345||

and I'm matching the street name and town using the LIKE pattern. The commas have been removed and location1 is all lowercase.

I'm using AWS and have 100GB storage in order to increase write speeds and I'm using a t2.small with 2GB of RAM.

My question is that can this query be made to run faster and how do I prevent the lock wait timeout exception. Is the query correct?


The aim of the query is to add x and y coordinates and a location to table1 from a 30 million file in table2.

Ive been thinking of some alternatives? will postgre postgis run this query faster than mysql. What about tackling this joining of tables from an application such as java.

KDJ
  • 69
  • 1
  • 1
  • 10
  • This query cannot use indexes to speed up the where criteria. Consider using fulltext indexes and fulltext search instead. – Shadow Jul 19 '17 at 00:30
  • 1
    Uh, your query creates a cartesian join between the 2 tables resulting in 4000*30 million records with no possibility to use indexes. You need to rethink this exercise completely. – Shadow Jul 19 '17 at 00:34
  • And there is no way we can comment on whether thequery is corect or not because we do not know what it is supposed to do, nor do we know your data. – Shadow Jul 19 '17 at 00:38
  • Hi Shadow, Im thinking of possible solutions, I should merge the street and town into 1 column, use a smaller table and do 1 LIKE %mergedcolumn%. Alternatively running the code from my application might go faster – KDJ Jul 19 '17 at 02:38
  • [link](https://stackoverflow.com/questions/14535846/how-do-i-load-100-million-rows-in-to-memory) This is a useful link, mysql in aws costs huge sums of money for high memory and storage but if you have a high performing ec2 you can upload the file to a collection and read from there. 16GB will easily give you 100,000 rows per second and aws rds is for me about 300-500 rows per second – KDJ Jul 19 '17 at 02:55

1 Answers1

2
  • The query is a terribly devilish one -- it must pair up every row of table1 with every row of table2; that's 120 billion things to test. Be glad it aborted instead of running until the 6th Sunday of this month.
  • OR is a performance killer consider picking either empty string or NULL, not both.
  • LIKE '%xx%' won't use an index.
  • Consider using FULLTEXT instead of LIKE.
  • 2GB of RAM either means that the innodb_buffer_pool_size is too small to cache the entire table, or that it is set so high that you are swapping. (Swapping is terrible for MySQL.) Find the value of that setting.
  • UPDATE needs and takes more serious locks than SELECTs. See if you can run SELECT id FROM tbl WHERE ... to get the ids for the rows in question, and see how long that takes. Then we can discuss possible alternatives.
  • Please provide SHOW CREATE TABLE for each table.

Bottom line: Rethink the algorithm.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Hi Rick, would postgresql with extension postgis make such a call feasible? – KDJ Jul 19 '17 at 01:49
  • 1
    @KDJ Mysql by default has geospatial data types and functions. But you are not doing any geospatial operations, you are doing pattern matching. So, no, switching over to postgresql will not solve anything. – Shadow Jul 19 '17 at 02:28
  • with postgresql, the table is stored on the heap in sequential order so that might affect speed – KDJ Jul 19 '17 at 02:39
  • Hi Shadow, what if I did the pattern match from an application rather than data and just read from both tables and if satisfies I can write 1 by 1 – KDJ Jul 19 '17 at 02:41
  • @KDJ - the order of the table won't matter since it has to scan every row. – Rick James Jul 19 '17 at 04:20