8

We dumped our table per Google Cloud SQL instructions and imported it into a second generation Google Cloud SQL instance.

We were very excited to see how our numbers would be running on "google hardware".

After stress testing our Rails app with Apache ab and seeing 150ms higher completed times, we noticed ActiveRecord was taking from 30ms to 50ms more than our production server (bare metal) in the same pages.

While we dug deeper, what really blew our minds were simple count queries like this:

GOOGLE CLOUD SQL - db-n1-standard-4 (4vcpu and 15GB RAM)

1. Cold query

mysql> SELECT COUNT(*) FROM `event_log`;
+----------+
| COUNT(*) |
+----------+
|  3998050 |
+----------+
1 row in set (19.26 sec)

2. Repeat query

mysql> SELECT COUNT(*) FROM `event_log`;
+----------+
| COUNT(*) |
+----------+
|  3998050 |
+----------+
1 row in set (1.16 sec)

SELECT @@innodb_buffer_pool_size/1024/1024/1024;
+------------------------------------------+
| @@innodb_buffer_pool_size/1024/1024/1024 |
+------------------------------------------+
|                          10.500000000000 |
+------------------------------------------+
1 row in set (0.00 sec)


I can then repeat the query multiple times and the performance is the same.

Running the same query in my macbook pro 2017 with the exact same dump:

MACBOOK PRO 2017

1. Cold query

mysql> SELECT COUNT(*) FROM `event_log`;
+----------+
| COUNT(*) |
+----------+
|  3998050 |
+----------+
1 row in set (1.51 sec)

2. Repeat query

mysql> SELECT COUNT(*) FROM `event_log`;
+----------+
| COUNT(*) |
+----------+
|  3998050 |
+----------+
1 row in set (0,51 sec)

SELECT @@innodb_buffer_pool_size/1024/1024/1024;
+------------------------------------------+
| @@innodb_buffer_pool_size/1024/1024/1024 |
+------------------------------------------+
|                           0.125000000000 |
+------------------------------------------+
1 row in set (0,03 sec)



What makes it even more absurd is that, as you can see above, I haven't tuned anything from my default mysql install, so it's using only 125MB of RAM in my Macbook, while the Google Cloud instance has 10GB of RAM available.

We tried increasing Google Cloud SQL instance size up to db-n1-highmen-8 (8vCPU with 52GB ram!) to no increase of performance (if we decrease from db-n1-standard-4 we do see a decrease in performance).

Last but not least, using this question we can confirm that our database has only 46GB, but during the import the storage usage in the google cloud sql kept growing until reaching absurd 74GB... we don't know if that's because of binary logging (which is ON on google cloud SQL by default and off on my local machine).

So .. isn't anyone using Google Cloud sql on production? :)

UPDATE: we used the exact same .sql dump and loaded it into a db.r4.large AWS RDS (so same cpu / ram) and got consistent 0,50s performance in the query, and it also didnt consume more then 46GB in the instance.

sandre89
  • 5,218
  • 2
  • 43
  • 64
  • Binlog is the culprit https://superuser.com/questions/848514/does-binary-logging-for-mysql-slow-down-performance – Hackerman Feb 03 '19 at 13:28
  • @Hackerman no it's not. We turned off binlogging (it's a simple checkbox in the console), and after the reboot these are the results: First COUNT (I call it 'cold query'): 9.84sec; other counts (tried > 10 times with consistent results): 1,14 seconds in average. – sandre89 Feb 03 '19 at 13:42
  • Hey man, I'm experiencing the same thing. Any solutions? On my case, I'm in an external server with digital ocean and I find it super slow when I connect my google cloud sql into my application. – Nico Zarris Feb 04 '19 at 21:37
  • @NicoZarris since we were comparing AWS x Google Cloud, we just decided to go with AWS for know and didn't dig further (see the updated question, in AWS we had optimal performance right out of the box). What sucks hard is that for southamerica the bandwith cost in AWS is more then 2x of GC. – sandre89 Feb 06 '19 at 11:48
  • Also google cloud sql for MySql uses GTID replication by default that could definitely be a huge factor for slow performance. https://cloud.google.com/sql/docs/mysql/1st-2nd-gen-differences – Raj Jul 04 '19 at 05:23

1 Answers1

5

Compare the execution plans (prepending EXPLAIN) and you'll likely find some notable implementation differences resulting from variations in configuration parameters beyond the buffer pool size.

I encountered similar issues setting up a Postgres Cloud SQL db over the weekend with ~100gb of data, mirroring a local db on my macbook pro. Performance was comparable to my local db for very targeted selects using indices, but queries that scanned non-trivial amounts of data were 2-5x slower.

Comparing the config results of SHOW ALL (SHOW VARIABLES in mysql I think) between local and cloud instances I noticed several differences, such as max_parallel_workers_per_gather = 0 on Cloud SQL vs 2 on my local instance.

In the case of a select count(*)... a max_parallel_workers_per_gather setting > 0 allows the use of a Gather over the results of parallel sequential scans using multiple workers; when set to zero the engine has to perform a single sequential scan. For other queries I noticed similar trends where parallel workers were used in my local db, with lower costs and faster speeds than the cloud instance.

That's just one contributing factor; I'm sure digging into settings would turn up many more such explanations. These are the tradeoffs that come with managed services (though it'd be nice to have more control over such parameters).

thomas
  • 180
  • 5