2

When you are doing hundreds of thousands of queries on a db server, switching from

SELECT foo FROM bar WHERE rec_id=xyz
if (!record_found){
    INSERT INTO bar (x,y,z) VALUES (1,2,3)
}

to:

INSERT INTO bar (x,y,z) VALUES (1,2,3) ON DUPLICATE KEY UDPATE count=count+1

has a HUGE performance gains. The CPU load on the server drops drastically. However, if you have fields in the table that are set to auto-increment, you lose contiguity and they increase at a very rapid rate.

I do not care about the contiguity of the auto-increment field but am a little concerned about the exponential rate at which the auto-increment field is ballooning at the moment since I encounter a ton of duplicate records. We are talking about the auto-increment value reaching a value of 10 million+ when the actual number of records in the database is close to 0.5 million (in matter of a week).

The field is of type BIGINT(20) so I still have ways to go. I also understand that once it increases by a couple of orders of magnitude, the rate of change may not be that significant. If I ever breach that limit, can I continue to make the field bigger? Is this the best possible way to deal with this situation?

hvs
  • 518
  • 1
  • 5
  • 21
  • Similar but not exact question here: http://stackoverflow.com/questions/5924762/prevent-autoincrement-on-mysql-duplicate-insert – Dave Jan 24 '16 at 02:25
  • Does INSERT IGNORE not increment the value of auto-increment field, in case of insert failure? (for innodb storage engine) Can someone please answer this if you know definitively? I might be able to get around with this. – hvs Jan 24 '16 at 02:44
  • it probably does should take 2 minutes to test it. Do you really need me to ? I will – Drew Jan 24 '16 at 02:47

2 Answers2

1

Just include rec_id (that you seem to know and it appears to be a key, probably primary) into the insert:

INSERT INTO bar (rec_id, x,y,z) 
VALUES (xyz, 1,2,3) 
ON DUPLICATE KEY count=count+1
weirdan
  • 2,499
  • 23
  • 27
  • You are right about the fact that rec_id is a primary key. However, I don't know the value of rec_id value when I am inserting the record. Whether the record is duplicate or not is determined on the basis of another compound key which is also unique. The rec_id field exists to relate to other tables. – hvs Jan 24 '16 at 02:36
  • I was under impression that `rec_id` is known because your 'original' code (that you said you replaced with `insert ... on duplicate key`) uses it. As for gaps, InnoDB with default settings do generate them (it has something to do with concurrent inserts). InnoDB with `innodb_autoinc_lock_mode=0` generates no gaps, but this mode has serious scalability limitations (it's a global lock, basically). – weirdan Jan 24 '16 at 03:18
1

To answer the question in comments under the question:

It does increment AUTO_INCREMENT upon failure (so to speak) of insert ignore.

create table xyz
(   id int auto_increment primary key,
    thing varchar(20) not null,
    unique key(thing)
);

insert xyz(thing) values ('frog');
insert ignore xyz(thing) values ('frog');
insert xyz(thing) values ('lizard');

select * from xyz;
+----+--------+
| id | thing  |
+----+--------+
|  1 | frog   |
|  3 | lizard |
+----+--------+

The whole index gap anomaly is a well-known "feature" of innodb. Most of us accept it and just move on.

Drew
  • 24,851
  • 10
  • 43
  • 78
  • that said, let one of the guys below solve it for you. Don't mean to rain on their parade :P – Drew Jan 24 '16 at 03:00
  • Thanks for testing it out. I also confirmed it on my end. As I said, the gaps in the index are not a problem for me but the rate of change of index field is causing concern. Whether I should be concerned or not is what I am asking experts like yourself – hvs Jan 24 '16 at 03:02
  • Check out this answer of mine: http://stackoverflow.com/a/33666394 There is a deliberate reason I went with MyISAM. It was because of gaps, and I wanted to (half way through) show the counts (well the min,max,count together without confusing everyone). And because MyISAM vs INNODB can raise such a stink with different audiences, I had to disclaim the use of MyISAM upfront. – Drew Jan 24 '16 at 03:09
  • So I am not saying to go with MyISAM. But be aware that if you were to work a one row table (without truncate) or some other strategy into your mix, you won't have gaps. I wrote a INNODB script once that shrunk pockets of gaps down. But it becomes so risky to the integrity of mission critical data I would never suggest it. – Drew Jan 24 '16 at 03:10
  • In my case, going with myisam is not an option. I am using AWS Aurora mysql instance which does not support myISAM. – hvs Jan 24 '16 at 03:49
  • we need details such as a schema and throughput requirements. If you want the features of innodb, which one should want, you get the good with the bad. It's mostly good, by a country mile. – Drew Jan 24 '16 at 03:51
  • You can create a stored proc that performs the necessary retrieval of the next id from a INNODB table. Its sole function would be the next id to use. There are many ways to solve these problems, but the devils in the details and they are not here. And to save you the suspense, people spend weeks on these issues. – Drew Jan 24 '16 at 03:53