3

I have a problem with high frequency insert in MySQL. I've searched a lot on Internet but haven't found a good answer to my problem.

I need to log a lot of event at a very high frequency (~3000 inserts / s => 260 millions row per day), these event are stored in a InnoDB table like that :

log_events :
 - id_user : BIGINT
 - id_event : SMALLINT
 - date : INT
 - data : BIGINT (data associated to this event)

My problems are :
- How to speed inserts ? Event are send by thousands of visitors and we are not able to bulk insert
- How to limit IO write ? We are on a 6*600 GB SSD drives and have write IO problems

Do you have any ideas to these kind of problem ?

Thanks

François

  • Are you reading from this log at the same time, or does that occur offline? If not, perhaps you would do better to log to a file and subsequently import into your RDBMS? If you do need the log to be immediately in the RDBMS, do you need to enforce ACID compliance? – eggyal Apr 22 '13 at 00:42
  • We can accept a 5min delay between the event occur and its availability in the DB. I've think to log the events to the filesystem in a log file, but I suppose we'll have big concurrent write problem no ? – user1892972 Apr 22 '13 at 00:50
  • There are other ways to deal with concurrency than using an RDBMS: if that was your sole motivation, it'd be like using a sledgehammer to crack a nut. How frequent are read queries? What indexes (if any) must you maintain? Do you need to enforce ACID compliance? – eggyal Apr 22 '13 at 00:52
  • @eggyal Concerning the index, we have : # date # id_event / data # id_event / date / data # id_user We don't need ACID compliance. Read queries are maybe 1 per minute, maximum. – user1892972 Apr 22 '13 at 00:55
  • Okay. Could you advise what OS you're using, and on what filesystem MySQL's data files are stored? I'm leaning toward suggesting that you build a (thread-safe) application that writes to a buffer in RAM and flushes to the RDBMS once tbe buffer is full. Since you don't require ACID compliance, you may find that InnoDB is producing IO that you just don't need; perhaps another storage engine, such as MyISAM, would suit you better? But I'm voting to migrate to [dba.se], as they will have much better idea how to tune your MySQL installation for this application. – eggyal Apr 22 '13 at 01:23
  • @eggyal Thanks for the link, I'll post on this forum too. We are using Debian GNU/Linux 6.0, with the default file system. We are not good with Linux development (we are just php developpers who manage billions of records :) ). Do you have any idea on how we could develop a thread safe application that write in RAM ? COncerning MyISAM, we had test, but the performance was not as good as InnoDB. – user1892972 Apr 22 '13 at 01:31
  • What happens when you run out of space in approx 5 months? – symcbean Apr 24 '13 at 15:30
  • @eggyal - I was rather under the impression that mysql+innodb *was* a threadsafe application that write to a buffer in RAM and flushes to the *DISK* when the buffer is full? (BTW with the required synchronization, it is very hard to create such a program from scratch which performs efficiently). – symcbean Apr 24 '13 at 22:01
  • @symcbean: It's certainly threadsafe, but the ACID-compliant nature of InnoDB results in considerable more disk IO (every transaction must be written to disk on commit in order to achieve Durability). – eggyal Apr 25 '13 at 08:00

2 Answers2

1

Do you have any foreign keys on that table? If so, I would consider to remove them and add indexes only on cols which are used for reads. This should improve writes.

The second idea is use some in-memory db (eg. redis, memcache) as a queue and some worker could get data from it and inserts in a bulk (for example for every 2 seconds) to mysql storage.

The another option if you don't need frequent reads is use archive storage instead of innodb: http://dev.mysql.com/doc/refman/5.5/en/archive-storage-engine.html. But it looks like it's not an option for you as long as it hasn't indexes at all (which means full scan table reads).

Another option is reorganize your db structure, eg. use partitioning (http://dev.mysql.com/doc/refman/5.5/en/partitioning.html). But it depends on how SELECTS looks like.

My additional questions are:

  • could you show whole table definition?
  • which fields are used for reads? could you show them?
  • do you need all data for your reads or maybe only recently ones? If so, how recently data must be? (eg. only from last day/week/month/year)
  • id_event is an event type, right? Number of possible events is static or it could change in the future?
Cyprian
  • 11,174
  • 1
  • 48
  • 45
1

Event are send by thousands of visitors and we are not able to bulk insert

You need to either bulk insert or shard the data. I would be tempted to try the bulk insert route first.

That you think you can't suggests these events are being created by autonomous processes - you just need to funnel them through an intermediary rather than direct to the database. And it would be easiest to implement that funnel as an event based server (rather than a threaded or forking server).

You don't say what the events are nor where they originate - which has some impact on the details of implementing a solution.

Both rsyslog and syslogng will talk to a MySQL backend - hence you can eliminate the overhead of establishing a new connection per message - but I don't know if either implements buffering / bulk inserts. It would certainly be possible to tail the files they produce with a single process and create bulk inserts from there.

It would relatively simple to write a funnel using this event based server, this buffer tool along with a bit of code to implement asynch mysqli calls and a watchdog. Or you could use node.js with an async mysql lib. There's also tools like statsd (again using node.js) which can also perform some aggregation on the data on the data.

Or you could just write something from scratch.

A write-only database is a useless piece of hardware though. You've not provided any details of how this data will be used - which has some relevance to designing a solution. Also since ideally the data feed would be a single process / DB session, it might be a beter idea to use MyISAM rather than InnoDB (I see in your later comment you said you had problems with MyISAM - presumably this was with multiple clients).

Community
  • 1
  • 1
symcbean
  • 47,736
  • 6
  • 59
  • 94