7

I'm running a sysbench OLTP benchmark with TiDB on GKE local SSD disk. But I'm getting poor performance compared to GKE persistent SSD disk. How can I get the expected IOPS performance on GKE local SSD disk by default?

I've run TiDB OLTP benchmark and fio benchmark with psync engine but the results both shows IOPS on local SSD disk is poorer than on persistent SSD disk. And I've also run a thorough blktrace analysis. The fio command I ran is:

fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=write -size=10G -filename=test -name="max throughput" -iodepth=1 -runtime=60 -numjobs=4 -group_reporting

The fio benchmark result for local SSD disk and persistent disk is:

| disk type           | iops | bandwidth |
|---------------------+------+-----------|
| local SSD disk      |  302 | 9912kB/s  |
| persistent SSD disk | 1149 | 37.7MB/s  |

And the blktrace btt result is:

==================== All Devices ====================

            ALL           MIN           AVG           MAX           N
--------------- ------------- ------------- ------------- -----------

Q2Q               0.000000002   0.003716416  14.074086987       34636
Q2G               0.000000236   0.000005730   0.005347758       25224
G2I               0.000000727   0.000005446   0.002450425       20575
Q2M               0.000000175   0.000000716   0.000027069        9447
I2D               0.000000778   0.000003197   0.000111657       20538
M2D               0.000001941   0.000011350   0.000431655        9447
D2C               0.000065510   0.000182827   0.001366980       34634
Q2C               0.000072793   0.001181298   0.023394568       34634

==================== Device Overhead ====================

       DEV |       Q2G       G2I       Q2M       I2D       D2C
---------- | --------- --------- --------- --------- ---------
 (  8, 48) |   0.3532%   0.2739%   0.0165%   0.1605%  15.4768%
---------- | --------- --------- --------- --------- ---------
   Overall |   0.3532%   0.2739%   0.0165%   0.1605%  15.4768%

According to the optimization guide, I’ve manually remounted the disk with the nobarrier option and the blktrace btt result looks as normal.

==================== All Devices ====================

            ALL           MIN           AVG           MAX           N
--------------- ------------- ------------- ------------- -----------

Q2Q               0.000000006   0.000785969  12.031454829      123537
Q2G               0.000003929   0.000006162   0.005294881       94553
G2I               0.000004677   0.000029263   0.004555917       94553
Q2M               0.000004069   0.000005337   0.000328930       29019
I2D               0.000005166   0.000020476   0.001078527       94516
M2D               0.000012816   0.000056839   0.001113739       29019
D2C               0.000081435   0.000358712   0.006724447      123535
Q2C               0.000113965   0.000415489   0.006763290      123535

==================== Device Overhead ====================

       DEV |       Q2G       G2I       Q2M       I2D       D2C
---------- | --------- --------- --------- --------- ---------
 (  8, 48) |   1.1351%   5.3907%   0.3017%   3.7705%  86.3348%
---------- | --------- --------- --------- --------- ---------
   Overall |   1.1351%   5.3907%   0.3017%   3.7705%  86.3348%

However, according to RedHat’s document, the nobarrier mount option should only have a very small negative impact on performance (about 3%) and it is not recommended to use it on storage configured on virtual machines.

The use of nobarrier is no longer recommended in Red Hat Enterprise Linux 6 as the negative performance impact of write barriers is negligible (approximately 3%). The benefits of write barriers typically outweigh the performance benefits of disabling them. Additionally, the nobarrier option should never be used on storage configured on virtual machines.

In addition to the nobarrier option, the local SSD disk optimization guide also suggests installing the Linux Guest Environment but states that it’s already installed on newer VM image. However, I found that it’s not installed on the GKE node.

So I manually installed the Linux Guest Environment and tested again, this time the btt result looks as expected:

==================== All Devices ====================

            ALL           MIN           AVG           MAX           N
--------------- ------------- ------------- ------------- -----------

Q2Q               0.000000001   0.000472816  21.759721028      301371
Q2G               0.000000215   0.000000925   0.000110353      246390
G2I               0.000000279   0.000003579   0.003997348      246390
Q2M               0.000000175   0.000000571   0.000106259       54982
I2D               0.000000609   0.000002635   0.004064992      246390
M2D               0.000001400   0.000005728   0.000509868       54982
D2C               0.000051100   0.000451895   0.009107264      301372
Q2C               0.000054091   0.000458881   0.009111984      301372

==================== Device Overhead ====================

       DEV |       Q2G       G2I       Q2M       I2D       D2C
---------- | --------- --------- --------- --------- ---------
 (  8, 80) |   0.1647%   0.6376%   0.0227%   0.4695%  98.4778%
---------- | --------- --------- --------- --------- ---------
   Overall |   0.1647%   0.6376%   0.0227%   0.4695%  98.4778%

So how can I get the expected IOPS performance on GKE local SSD disk by default without extra tuning?

tennix
  • 331
  • 2
  • 5
  • I think this might be better asked over on https://serverfault.com/ but one quick tip: it's really useful to see fio's output when you're asking about questions involving it. Also note syncing after every block is going to be extremely slow and your disks may well be able to sustain more than 4 32k blocks at once... – Anon Feb 03 '19 at 17:55
  • Thank you for your suggestion. As I mentioned in the question, RedHat document says the performance impact is negligible for `nobarrier` mount option. But what I got in the GKE local SSD disk is that the impact is huge. – tennix Feb 11 '19 at 02:52
  • I'd imagine that would be the typical impact but I'm sure you can construct cases that are pathological and demonstrate a high impact. Generally any program should be trying to sync as little as possible (such that it is still safe etc.) precisely because it has a performance impact. If you're syncing every minute then yes the impact of `nobarrier` is going to be minimal... – Anon Feb 11 '19 at 06:31
  • @tennix have you found a solution to this issue in the meantime? I am also seeing worse read speeds in real world use cases. – Lukas Geiger Jul 24 '19 at 22:55
  • @tennix how did you mount the devices? I mounted 16 into raid-0 and achieved expected performance – Michael Ramos Aug 14 '20 at 00:22

0 Answers0