49

I know there is one issue in MySQL with concurrent SELECT and INSERT. However, my question is if I open up two connections with MySQL and keep loading data using both of them, does MySQL takes data concurrently or waits for one to finish before loading another?

I’d like to know how MySQL behaves in both cases. Like when I am trying to load data in the same table or different tables concurrently when opening separate connections.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Kamrul Khan
  • 3,260
  • 4
  • 32
  • 59

4 Answers4

54

If you will create a new connection to the database and perform inserts from both the links, then from the database's perspective, it will still be sequential.

The documentation of Concurrent Inserts for MyISAM on the MySQL's documentation page says something like this:

If MyISAM storage is used and table has no holes, multiple INSERT statements are queued and performed in sequence, concurrently with the SELECT statements.

Mind that there is no control over the order in which two concurrent inserts will take place. The order in this concurrency is at the mercy of a lot of different factors. To ensure order, by default you will have to sacrifice concurrency.

JRichardsz
  • 14,356
  • 6
  • 59
  • 94
displayName
  • 13,888
  • 8
  • 60
  • 75
  • 1
    so what if it is same database but different tables ? – Kamrul Khan Aug 30 '15 at 16:50
  • @user3360140: in that case the connections will not be competing to lock the table. Therefore they both will get the lock and will be able to insert to both tables at the same time. – displayName Aug 30 '15 at 16:52
  • 1
    awesome. Thanks a lot! – Kamrul Khan Aug 30 '15 at 17:10
  • Can you please update your answer as per my update? also, you mentioned, "Concurrent insertions are still subject to concurrent_insert parameter. Make sure it is not set to 0"; can you please explain that a bit further? I know concurrent_insert has to be enabled for concurrent select and inserts; but how does it effect multiple parallel inserts (same table / different table)? – Kamrul Khan Aug 30 '15 at 17:51
  • @user3360140: Updated. Sorry for the confusion there. Inserts will always be sequential. The `concurrent_insert` param doesn't change anything in that regard. It is about concurrent `SELECT` and `INSERT`. – displayName Aug 30 '15 at 18:07
  • Imagine I am inserting rows baed on primary key, say one thread is inserting rows whose keys lie in the range 1-100, another thread is inserting rows whose key lie in the range 200-300. If it was one thread, we would be sure that it will be sorted in order of the primary key which is an integer. But here, because they are concurrent, will it be guaranteed that the resulting rows will still be sorted according to primary key? – SexyBeast Jun 05 '16 at 13:11
  • @AttitudeMonger: I don't think the insert will be sorted by primary key. There is no particular reason for it t be. However almost always there is an index on primary key. That index can be maintained as a self balancing tree. – displayName Jun 05 '16 at 14:33
  • If I add an index, will it ensure that even if the inserts are not consecutive, the data will be stored in a sorted manner (through clustered index) ? – SexyBeast Jun 05 '16 at 15:37
  • And even though there is no index on the primary key I have added explicitly, say if I insert 2 before 1, but when I query `select all`, 1 comes before 2 in the result set. So I assume it is sorted internally during each insert. The same behaviour will surely hold true for concurrent inserts as well? – SexyBeast Jun 05 '16 at 15:54
  • @AttitudeMonger: What are you really trying to achieve? Why is it a requirement for you to have data stored in a sorted manner? If you want 1 before 2 in the result set, a simple declarative way is to add an ORDER BY clause in your query. – displayName Jun 05 '16 at 15:58
  • Well, sorted primary key is critical to prevent fragmentation. A random set of non-consecutive integers is just like setting a GUID as primary key, it can cause heavy fragmentation. – SexyBeast Jun 05 '16 at 16:01
  • @AttitudeMonger: Maintaining sorted primary key with concurrent and unsorted inserts will cause your inserts to limp. You will be effectively performing Insertion sort in this case. The fragmentation that you are worried about can be taken care of by (a) Caching in large RAM and (b) Proper indexing for fast retrieval. – displayName Jun 05 '16 at 16:18
  • Maybe I am asking a very basic question, but is above-mentioned statement stated for inserts, true for updates also? Are updates are also sequential? – Dhruv Nov 15 '18 at 11:33
  • @dhroove: [Updates have to be handled differently](https://stackoverflow.com/a/1205379/1835769). – displayName Nov 15 '18 at 13:17
  • is this anyhow related to optimistic concurrency? https://en.wikipedia.org/wiki/Optimistic_concurrency_control – Ridhwaan Shakeel Oct 24 '20 at 19:19
20

MySQL does support parallel data inserts into the same table.

But approaches for concurrent read/write depends upon storage engine you use.

InnoDB

MySQL uses row-level locking for InnoDB tables to support simultaneous write access by multiple sessions, making them suitable for multi-user, highly concurrent, and OLTP applications.

MyISAM

MySQL uses table-level locking for MyISAM, MEMORY, and MERGE tables, allowing only one session to update those tables at a time, making them more suitable for read-only, read-mostly, or single-user applications

But, the above mentioned behavior of MyISAM tables can be altered by concurrent_insert system variable in order to achieve concurrent write. Kindly refer to this link for details.

Hence, as a matter of fact, MySQL does support concurrent insert for InnoDB and MyISAM storage engine.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
seahawk
  • 1,872
  • 12
  • 18
7

You ask about Deadlock detection, ACID and particulary MVCC, locking and transactions:

Deadlock Detection and Rollback

InnoDB automatically detects transaction deadlocks and rolls back a transaction or transactions to break the deadlock. InnoDB tries to pick small transactions to roll back, where the size of a transaction is determined by the number of rows inserted, updated, or deleted. When InnoDB performs a complete rollback of a transaction, all locks set by the transaction are released. However, if just a single SQL statement is rolled back as a result of an error, some of the locks set by the statement may be preserved. This happens because InnoDB stores row locks in a format such that it cannot know afterward which lock was set by which statement.

https://dev.mysql.com/doc/refman/5.6/en/innodb-deadlock-detection.html

Locking

The system of protecting a transaction from seeing or changing data that is being queried or changed by other transactions. The locking strategy must balance reliability and consistency of database operations (the principles of the ACID philosophy) against the performance needed for good concurrency. Fine-tuning the locking strategy often involves choosing an isolation level and ensuring all your database operations are safe and reliable for that isolation level.

http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_locking

ACID

An acronym standing for atomicity, consistency, isolation, and durability. These properties are all desirable in a database system, and are all closely tied to the notion of a transaction. The transactional features of InnoDB adhere to the ACID principles. Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database, either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back. The database remains in a consistent state at all times -- after each commit or rollback, and while transactions are in progress. If related data is being updated across multiple tables, queries see either all old values or all new values, not a mix of old and new values. Transactions are protected (isolated) from each other while they are in progress; they cannot interfere with each other or see each other's uncommitted data. This isolation is achieved through the locking mechanism. Experienced users can adjust the isolation level, trading off less protection in favor of increased performance and concurrency, when they can be sure that the transactions really do not interfere with each other.

http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_acid

MVCC

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions of the single row can exist at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode you have chosen, InnoDB might have to keep all row versions going back to the earliest active read view, but at the very least it will have to keep all versions going back to the start of SELECT query which is currently running

https://www.percona.com/blog/2014/12/17/innodbs-multi-versioning-handling-can-be-achilles-heel/

Anatoly
  • 15,298
  • 5
  • 53
  • 77
4

It depends.

It depends on the client -- some clients allow concurrent access; some will serialize access, thereby losing the expected gain. You have not even specified PHP vs Java vs ... or Apache vs ... or Windows vs ... Many combinations simply do not provide any parallelism.

If different tables, there is only general contention for I/O, CPU, Mutexes on the buffer_pool, etc. A reasonable amount of parallelism is possible.

If same table, it depends on the indexes and access patterns. In some cases the threads will block each other. In some cases it will even "deadlock" and rollback one of the transactions. Deadlocks not only slow you down, but make you retry the inserts.

If you looking for high speed ingestion of a lot of rows, see my blog. It lays out techniques, and points out sever of the ramifications, such as replication, Engine choice, multi-threading.

Multiple threads inserting into the same tables -- It depend a lot on the values you are providing for any PRIMARY or UNIQUE keys. It depends on whether other actions are taken in the same transaction. It depends on how much I/O is involved. It depends on whether you are doing single-row inserts, or batching. It depends on ... (Sorry to be vague, but your question is not very specific.)

If you would like to present specifics on two or three designs, we can discuss the specifics.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Rick James
  • 135,179
  • 13
  • 127
  • 222