-1

I have a table with one column (about 1 million rows) :

Table A (InnoDB)

ColA (no index, no primary key)

I also have a table (3 million rows):

Table B (InnoDB)

ColA (PK INDEX) | count

Table B is a Superset of Table A. Table B contains every row I need, while Table A only contains some rows. I want to increment the count field of Table B by 1 for every row which exists in Table A. The problem is Table A is quite large so I can't use a normal UPDATE command with a where clause because the where clause would have 1 million conditions. I am really looking for the most optimized way in terms of performance to increment these fields. The solution I can think of is :

UPDATE TABLE_B set count = count + 1 where ColA IN (SELECT ColA FROM TABLE_A)

But that would require two full index scans on both tables so I am not sure if it is the fastest way to do this task.

user2924127
  • 6,034
  • 16
  • 78
  • 136

3 Answers3

2

First, add an index on table A:

create index idx_tablea_cola on table_a(cola)

Then phrase your query as:

update table_b b
    set count = count + 1
    where exists (select 1 from table_a a where a.colA = b.colA);

This assumes that there are no duplicates in tablea. Or, at least if there are, you still only want the counter incremented by 1.

WIthout an index, this will be a painful operation.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

I found a faster way. It seems to be about 30% -35% faster than the other answer:

INSERT
INTO   TABLE_B
SELECT colA, 1 FROM TABLE_A
ON DUPLICATE KEY
UPDATE  count = count + 1
user2924127
  • 6,034
  • 16
  • 78
  • 136
  • Are there records in `table_a` that are not in `table_b`? – Gordon Linoff Sep 14 '14 at 22:47
  • No, all records in table A are in table B. I understand this isn't used for this case usually, but it produces the same result and seems to be much quicker. – user2924127 Sep 14 '14 at 23:14
  • That is quite interesting. Originally, I was thinking "this answer doesn't answer the question being asked". But, with no new records it does and the performance is better. – Gordon Linoff Sep 15 '14 at 02:22
1

Ensuring ColA is indexed in both tables, this would work:

UPDATE B set count = count + 1 
WHERE (
    SELECT 1 FROM A 
    WHERE A.ColA = B.ColA LIMIT 1
) IS NOT NULL
George G
  • 7,443
  • 12
  • 45
  • 59
  • 1
    You should use `exists` rather than the construct you have. – Gordon Linoff Sep 14 '14 at 20:48
  • you are wrong, this is most optimal way, just think man `exists` is equal something to check if required statement is met even once, so I've already made my sub-query as much optimal that `exists` can not add more, even it's even extra calling for function – George G Sep 14 '14 at 20:57
  • 1
    This seems to be the fastest way. Do you think the InnoDB is the best engine for Table B that is only used for updates (incrementing one field) by only one user at a time? – user2924127 Sep 14 '14 at 21:14
  • Yes `InnoDB` is better, for more see http://stackoverflow.com/questions/20148/myisam-versus-innodb – George G Sep 14 '14 at 21:18