0

I have 300,000 records in table "businesses" and 50,000 in table "mint", both have a "phoneno" col which is cleansed (they are all in the same format so are viable to be my way of matching)

Im trying to check if they are in the businesses table before i insert them, so i made another column on the mint table and run the below query. the idea is to then insert all the null values into "businesses".

update mint t9
inner join  businesses b
  on  b.Telephone = t9.phoneno
set t9.bid = b.businessID 

The problem is the query just runs forever until it timesout and never gives me any results, even though i know there are some that should be matched. Am i doing something obviously wrong? -------------------------------------- edit-------------------------------

The columns in the tables that are relevant are :

---businesses-- BusinessID | Telephone

---mint-- phoneno | bid

The bid in mint is a null column i added, the idea was to insert into that if it existed in the businesses table so i could later run a query to insert the none existant records.

I am basically trying to check if that phone number from the mint table is already in businesses before i insert it, otherwise i want to update it.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Gaz Smith
  • 1,100
  • 1
  • 16
  • 30
  • Could you share the `mint` and `businesses` table definitions and a sample of what you expect the results to be? – Jonathon Ogden Jul 27 '16 at 21:08
  • @JonathonOgden sorry, i have made an edit. – Gaz Smith Jul 27 '16 at 21:12
  • any of these columns that are relevant also _keys_ or have _indices_ (indexes) on them at all? – Jonathon Ogden Jul 27 '16 at 21:20
  • Why you want check? .. if the column match then the update happend otherwise don't happen – ScaisEdge Jul 27 '16 at 21:26
  • @JonathonOgden Yes but they arn't relevent, im importing data from a 3rd party and checking via phonenumber if it already exists so i dont get duplicates – Gaz Smith Jul 27 '16 at 21:43
  • Is the below answer useful? Try commenting, upvoting, accepting answers on the Stack. It encourages people to take the time for you and others in the future. – Drew Aug 02 '16 at 15:19

1 Answers1

0

A few ways come to mind.

Segments: Due to the timeout and this is common, perform your update in segments. I would often do this such as

update mint t9
inner join  businesses b
on  b.Telephone = t9.phoneno
set t9.bid = b.businessID 
where t9.id between 1 and 50000;

update mint t9
inner join  businesses b
on  b.Telephone = t9.phoneno
set t9.bid = b.businessID 
where t9.id between 50001 and 100000;

etc. you get the idea. Reason being that InnoDB is logging this as an implicit transaction and your timeout occurs. I have this problem when I go against tens of millions of rows.

or

IODKU: Use INSERT ... ON DUPLICATE KEY UPDATE. See one of my examples here.

or

Use INSERT IGNORE ...

Insert ignore is definitely the wild west way of doing it but if you are careful in its use and you are experienced, it is great. If a key clash occurs, the whole row is skipped. Note, this may not be your use case.

Graham
  • 7,431
  • 18
  • 59
  • 84
Drew
  • 24,851
  • 10
  • 43
  • 78