0

I have a small email newsletter system and when a new email address is added (through mass import) it defaults to "subscribed int(11) DEFAULT '1'`".

I then have the below query which looks for the email address and updates any that are already in the table but are un-subscribed :

UPDATE emailData SET subscribed =  '0'
WHERE subscribed = '1' and emailAddress IN

(
  SELECT emailAddress FROM
  (SELECT emailAddress FROM emailData  WHERE subscribed = '0' GROUP BY emailAddress) AS tmptable

) 

With around 5000 duplicates it takes around 15 seconds to execute (VM Server) and I wanted to know if there was a better / faster way to do this?

Dharman
  • 30,962
  • 25
  • 85
  • 135

2 Answers2

0

Yes, there's a better way - don't have duplicates. Create a unique (primary?) index on email address and set your import operation to ignore duplicates (otherwise it will still do the import but generate lots of errors).

symcbean
  • 47,736
  • 6
  • 59
  • 94
-1

You may get performance by below steps-

1. Need to change query as per below-

    UPDATE emailData AS ed JOIN emailData AS ed1 
    ON ed.emailAddress = ed1.emailAddress 
    SET ed.subscribed =  '0'
    WHERE ed.subscribed = '1' AND ed1.subscribed = '0';

2. emailAddress field data length should be short as much possible may be varchar(50) or varchar(60) if possible.

3. make a composit index on emailAddress and subscribed fields.

Note: If emailAddress field is text or like varchar(250) etc and you can't short it then make partial index like first 50 characters only which will be enough and fast.
Zafar Malik
  • 6,734
  • 2
  • 19
  • 30
  • "emailAddress field data length should be as short..." - firstly, that will have no impact if its indexed correctly and secondly: http://stackoverflow.com/questions/386294/what-is-the-maximum-length-of-a-valid-email-address, thirdly, if it is a varchar field and *not* properly indexed then the cost of the comparison is determined by the length of data NOT the maximum size of the attribute. – symcbean Feb 08 '16 at 15:55
  • @symcbean :Here I am not stating that what is standard length of email just stating that better will be if can keep length short if possible other wise create partial index....length of index is matter or not in performance...you can verify by a practical example with billions rows and text column.... – Zafar Malik Feb 09 '16 at 06:34
  • "better will be if can keep length short" - No, in no way is it better to truncate the data - a partial email address is useless, it has no impact on performance, I have no idea what you mean by a "partial index" in this context (do you mean a prefix index)? How does a less selective index improve performance? – symcbean Feb 09 '16 at 11:49
  • @symcbean : you are taking my statements in wrong way even in negatively I meant suppose all emails length is under varchar(60) then keep length as varchar(80) or 100...if it is not the case and 90% emails are under 50 char then can use prefix index (partial index) on left 50 char which will satisfied max. cases where it will not satisfy query will do table scan which will be rare to rare case. – Zafar Malik Feb 09 '16 at 12:02