Recent Discovery
Among everything else I tried, I replaced my JMeter profile with a custom JavaScript that hit each of my API endpoints in turn in an infinite loop, and then ran this script in parallel in different browsers (one Firefox, one Chrome, one Safari) - to try and rule out issues related to all of my connections coming from the same source (same user agent, same cookies, same session ID, etc)
When I did this, I noticed that all of my issues went away. The queries ran in parallel and the app was a lot more responsive than JMeter would have you believe
It seems impossible to me that JMeter would be serializing the requests, since it's a de facto standard for load testing. So I started trying to reproduce the behavior
In an attempt to re-create the JMeter I created the following two PHP scripts which (hopefully) simulated my Yii application:
slow.php
<?php
session_start();
$_SESSION['some'] = 'value';
// Yii is calling session_write_close() almost immediately after
// the session is initialized, but to try and exacerbate issues,
// I've commented it out:
// session_write_close();
$dsn = "mysql:host=localhost;dbname=platypus;unix_socket=/tmp/mysql.sock";
$pdo = new PDO($dsn, "user", "password");
// Yii was using whatever the default persistence behavior was,
// but to try and exacerbate issues I set this flag:
$pdo->setAttribute(PDO::ATTR_PERSISTENT, true);
// Simulate a query running for 1 second by issuing a 1-second sleep
$pdo->query("DO SLEEP(1)");
echo "Done";
fast.php
<?php
session_start();
$_SESSION['some'] = 'value';
$dsn = "mysql:host=localhost;dbname=platypus;unix_socket=/tmp/mysql.sock";
$pdo = new PDO($dsn, "user", "password");
$pdo->setAttribute(PDO::ATTR_PERSISTENT, true);
// Simulate a query running for 0.1 seconds
$pdo->query("DO SLEEP(0.1)");
echo "Done";
Running JMeter against these two new endpoints there was no serialization of requests. Everything ran in parallel. fast.php always returned in 100-150ms and slow.php always returned in 1000-1050ms even as I scaled up to 3, 4, and 5 threads. I was able to watch things collapse at 11 threads, but that's because I exceeded the number of worker threads in PHP
So to summarize:
- The issue only occurs when profiling my API with JMeter and is not inherent in the app itself
- The issue isn't just a JMeter bug, but is somehow tied to my application or Yii 1.1
- I tried but could not come up with a minimum repro case
Despite the issue being non-existent when profiling with others tools, lots of people responded and gave lots of helpful information:
- Avoid persistent connections in PHP (could cause multiple requests to share a connection, probably not)
- Avoid session locking by calling
session_write_close()
as early as possible - Ensure you have enough PHP worker threads to handle the number of simultaneous connections
- MySQL fully supports parallel requests (if the hardware can handle it)
- Be wary of table locking (any transaction with an
UPDATE
statement could potentially lock the tables) - MyISAM does table-level locking instead of row-level locking
Original Post
I inherited a web application and I'm trying to make sense of its performance profile so I can start optimizing it for speed.
One thing I noticed pretty early on is that requests to the server are being serialized. For instance, suppose I have three endpoints with response times like so:
/api/endpoint1 --> 50ms
/api/endpoint2 --> 1000ms
/api/endpoint3 --> 100ms
If I hit a single endpoint, I measure the appropriate response times. But when I set up a script to hit all 3 at once I will sometimes see something like the following:
endpoint1: 50ms
endpoint2: 1050ms
endpoint3: 1150ms
Clearly the call to endpoint3 was queued and waiting for the call to endpoint2 to finish before it got a response.
My first thought was that this should be trivially solved with multithreading, so I took a look at the server configuration. PHP-FPM's process manager was set to "dynamic" with "start_servers" of 1, "max_children" of 5, and "max_spare_servers" of 2. For testing purposes I swapped this to "static" so that 5 PHP processes would remain open for handling connections in parallel (more than the 3 for the number of endpoints I was hitting, so they should be able to process simultaneously)
This had no impact on performance, so I looked at my nginx config. "worker_processes" was set to 1 with "worker_connections" set to 1024. I know that nginx uses an event loop model, so it shouldn't be blocking while it waits for a response from PHP-FPM. But just in case, I bumped up "worker_processes" to 5
Still, no effect. So next I looked at the database. All 3 endpoints had to hit the database, and I know as a fact that the 1000ms response time is mostly spent waiting on a long-running database query. I tried setting "thread_pool-size" to 5 and also within the MySQL REPL I set "innodb_parallel_read_threads" and "mysqlx_min_worker_threads" to 5
Still, my requests were getting serialized. When I log into the MySQL REPL and type show processlist;
while my script is running (using a while-true loop to repeatedly hit those 3 API endpoints) I noticed that there was only ever one connection to the web application's user
Unfortunately I'm not sure if my issue lies with the database (not allowing more than one connection), with PHP-FPM (not processing more than one request at a time), or with nginx (not forwarding more than one request at a time to PHP-FPM). I'm also not sure how to figure out which one is acting as the bottleneck
Update
Looking around some more I found this SO post which seems to suggest that MySQL doesn't support parallel queries from the same user (e.g. from the web application user)
Is this true? Surely such a ubiquitous database engine wouldn't have such a performance flaw, especially given how commonly it's used with AWS for massively scaled applications. I understand that for simple "read from disk" queries parallelizing them wouldn't improve performance since they'd just have to sit in a queue waiting on disk I/O, but modern databases have in-memory caches, and most of the really slow operations like filesort tend to happen in memory. There's no reason a disk-bound query couldn't run in parallel (make a request to disk and start waiting on I/O) while a cpu-bound query is busy sorting a table in RAM. The context switching may slightly slow down the cpu-bound queries, but if slowing those down from 1000ms to 1200ms means my 5ms query can run in 5 ms, I think that's worth it.
My queries
Here are the queries for my 3 endpoints. Note that the timings listed are the response time for the full HTTP pipeline (from browser request to response) so this includes overhead from nginx and PHP, plus any post-processing of the query done in PHP. That said, the query in endpoint 2 makes up 99% of the runtime, and locks the database so that endpoints 1 and 3 are queued up instead of returning quickly.
endpoint1 (50ms)
SELECT * FROM Widget WHERE id = 1 LIMIT 1
(Note that 50ms is the full response time for the endpoint, not how long the query takes. This query is clearly on the order of microseconds)
endpoint2 (1000ms)
USE platypus;
SELECT `t`.`(49 fields)` AS `t0_cX`,
`site`.`(29 fields)` AS `t2_cX`,
`customer`.`(26 fields)` AS `t4_cX`,
`domain`.`(20 fields)` AS `t6_c0`,
`domain-general_settings`.`(18 fields)` AS `t8_cX`,
`domain-access_settings`.`(17 fields)` AS `t9_cX`,
`customer-general_settings`.`(18 fields)` AS `t10_cX`,
`customer-access_settings`.`(17 fields)` AS `t11_cX`,
`site-general_settings`.`(18 fields)` AS `t12_cX`,
`site-access_settings`.`(17 fields)` AS `t13_cX`,
`backup_broadcast`.`(49 fields)` AS `t14_cX`,
`playlists`.`(11 fields)` AS `t16_cX`,
`section`.`(10 fields)` AS `t17_cX`,
`video`.`(16 fields)` AS `t18_cX`,
`general_settings`.`(18 fields)` AS `t19_cX`,
`access_settings`.`(17 fields)` AS `t20_cX`,
FROM `broadcast` `t`
LEFT OUTER JOIN `site` `site`
ON ( `t`.`site_id` = `site`.`id` )
LEFT OUTER JOIN `customer` `customer`
ON ( `site`.`customer_id` = `customer`.`id` )
LEFT OUTER JOIN `domain` `domain`
ON ( `customer`.`domain_id` = `domain`.`id` )
LEFT OUTER JOIN `generalsettings` `domain-general_settings`
ON ( `domain`.`general_settings_id` =
`domain-general_settings`.`id` )
LEFT OUTER JOIN `accesssettings` `domain-access_settings`
ON
( `domain`.`access_settings_id` = `domain-access_settings`.`id` )
LEFT OUTER JOIN `generalsettings` `customer-general_settings`
ON ( `customer`.`general_settings_id` =
`customer-general_settings`.`id` )
LEFT OUTER JOIN `accesssettings` `customer-access_settings`
ON ( `customer`.`access_settings_id` =
`customer-access_settings`.`id` )
LEFT OUTER JOIN `generalsettings` `site-general_settings`
ON ( `site`.`general_settings_id` =
`site-general_settings`.`id` )
LEFT OUTER JOIN `accesssettings` `site-access_settings`
ON ( `site`.`access_settings_id` =
`site-access_settings`.`id` )
LEFT OUTER JOIN `broadcast` `backup_broadcast`
ON ( `t`.`backup_broadcast_id` = `backup_broadcast`.`id` )
AND ( backup_broadcast.deletion IS NULL )
LEFT OUTER JOIN `playlist_broadcast` `playlists_playlists`
ON ( `t`.`id` = `playlists_playlists`.`broadcast_id` )
LEFT OUTER JOIN `playlist` `playlists`
ON
( `playlists`.`id` = `playlists_playlists`.`playlist_id` )
LEFT OUTER JOIN `section` `section`
ON ( `t`.`section_id` = `section`.`id` )
LEFT OUTER JOIN `video` `video`
ON ( `t`.`video_id` = `video`.`id` )
AND ( video.deletion IS NULL )
LEFT OUTER JOIN `generalsettings` `general_settings`
ON ( `t`.`general_settings_id` = `general_settings`.`id` )
LEFT OUTER JOIN `accesssettings` `access_settings`
ON ( `t`.`access_settings_id` = `access_settings`.`id` )
WHERE
(
(
t.id IN (
SELECT `broadcast`.id FROM broadcast
LEFT JOIN `mediashare` `shares`
ON ( `shares`.`media_id` = `broadcast`.`id` )
AND `shares`.media_type = 'Broadcast'
WHERE
(
(
broadcast.site_id IN(
'489', '488', '253', '1083', '407'
)
OR
shares.site_id IN(
'489', '488', '253', '1083', '407'
)
)
)
)
)
AND
(
(
(
(t.deletion IS NULL)
)
)
AND
(
IF(
t.backup_mode IS NULL,
t.status,
IF(
t.backup_mode = 'broadcast',
backup_broadcast.status,
IF(
t.backup_mode = 'embed',
IF(
t.backup_embed_status,
t.backup_embed_status,
IF(
'2020-01-08 16:34:52' < t.date,
1,
IF(
t.date > Date_sub(
'2020-01-08 16:34:52',
INTERVAL IF(t.expected_duration IS NULL, 10800, t.expected_duration) second
),
10,
12
)
)
),
t.status
)
)
) != 0
)
)
)
LIMIT 10;
This query takes roughly 1000ms to run, but the PHP for the endpoint is extremely simple (run the query, return the results as JSON) and only adds a couple milliseconds of overhead
endpoint 3 (100ms)
SELECT * FROM platypus.Broadcast
WHERE deletion IS NULL
AND site_id IN (SELECT id FROM platypus.Site
WHERE deletion IS NULL
AND customer_id = 7);
There's additional validation on the PHP side here which makes this endpoint take 100ms. The SQL, as you can see, is still fairly simple.
Create Table Statements
As there is a post length limit in StackOverflow, I cannot show the CREATE TABLE for every single table touched by endpoint 2, but I can show at least one table. Others use the same engine.
CREATE TABLE `Widget` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`widget_name` varchar(255) NOT NULL,
`widget_description` varchar(255) NOT NULL,
`status` varchar(255) NOT NULL,
`date_created` datetime NOT NULL,
`date_modified` datetime NOT NULL,
`auto_play` varchar(255) NOT NULL,
`on_load_show` varchar(255) NOT NULL,
`widget_content_source` varchar(255) NOT NULL,
`associated_sites` text NOT NULL,
`author_id` int NOT NULL,
`associated_sections` text,
`after_date` datetime DEFAULT NULL,
`before_date` datetime DEFAULT NULL,
`show_playlists` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`is_classic` tinyint(1) NOT NULL,
`default_site` int unsigned DEFAULT NULL,
`auth_code_url` varchar(255) DEFAULT NULL,
`widget_layout_id` int unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_Widget_widget_layout_id_WidgetLayout_id` (`widget_layout_id`),
CONSTRAINT `fk_Widget_widget_layout_id_WidgetLayout_id` FOREIGN KEY (`widget_layout_id`) REFERENCES `WidgetLayout` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1412 DEFAULT CHARSET=utf8
Note
Notice that endpoint 2 doesn't even touch the Widget table, but endpoint 1 (which ONLY touches the Widget table) is also queued up. This eliminates the possibility of table locking.
When observing the process list in MySQL, only one connection is ever being made to the database from the application user. The issue may therefore lie in my PHP configuration.
Explain for query 2
Attached is the EXPLAIN SELECT ...
query for endpoint 2
Simpler experiments
To try and determine where the parallel pipeline was falling apart, I created two simple scripts:
sleep.php
<?php
sleep(5);
echo "Done sleeping";
return.php
<?php
echo "That's all";
Doing this (sleeping in PHP) and running my script to hit these two endpoints with 3 threads I saw no issues. return.php always came back in ~11 milliseconds, despite sleep.php taking 5066 on average. I then tried doing the sleeping in MySQL:
sleep.php
<?php
$pdo = new PDO("...", "user", "pass");
$pdo->query("DO SLEEP(5)");
echo "Done sleeping";
This, again, had no issues. The sleeping endpoint did not block the non-sleeping one.
This means that the issue does not exist at the nginx, PHP, or PDO level - but that there must be some kind of table or row locking going on. I'm going to re-enable the general query log and scan through every query being performed to see if I can figure out what's happening.
Final Update
If you scroll up to "Recent Discovery" at the top of this post, you'll notice that I've modified my understanding of the problem.
I was not having an issue with parallelization, but with JMeter. I have failed to create a simple repro case, but I know now that the issue does not lie with my application but rather with how I'm profiling it.