1

I'm working on a migration from MySQL to Postgres on a large Rails app, most operations are performing at a normal rate. However, we have a particular operation that will generate job records every 30 minutes or so. There are usually about 200 records generated and inserted after which we have separate workers that pick up the jobs and work on them from another server.

Under MySQL it takes about 15 seconds to generate the records, and then another 3 minutes for the worker to perform and write back the results, one at a time (so 200 more updates to the original job records).

Under Postgres it takes around 30 seconds, and then another 7 minutes for the worker to perform and write back the results.

The table being written to has roughly 2 million rows, and 1 sequence column under ID.

I have tried tweaking checkpoint timeouts and sizes with no luck.

The table is heavily indexed and really shouldn't be any different than it was before.

I can't post code samples as its a huge codebase and without posting pages and pages of code it wouldn't make sense.

My question is, can anyone think of why this would possibly be happening? There is nothing in the Postgres log and the process of creating these objects has not changed really. Is there some sort of blocking synchronous write behavior I'm not aware of with Postgres?

I've added all sorts of logging in my code to spot errors or transaction failures but I'm coming up with nothing, it just takes twice as long to run, which doesn't seem correct to me.

The Postgres instance is hosted on AWS RDS on a M3.Medium instance type.

We also use New Relic, and it's showing nothing of interest here, which is surprising

Cheyne
  • 1,964
  • 4
  • 27
  • 43
  • If you can't give details of (1) what the queue is, (2) how jobs are created or (3) what the worker does, how on earth can anyone tell what is different? – Richard Huxton Aug 12 '14 at 21:52
  • The worker performs local system tasks and simply makes return API calls with the job information when done to update the individual records. The jobs are created by a scheduler every hour , a controller method is hit and a dynamic set of tasks are calculated and inserted into the table in a "pending" state. The worker picks these up via an API call. I doubt anyone will be able to give me an answer, but I would like some suggestions or maybe some direction around Postgres usage in this scenario. Its been mentioned that indexes work differently on Postgres? – Cheyne Aug 12 '14 at 23:30
  • Have you tried tuning postgres (by editing postgres.conf) at all? Initial settings are very low. Also if you used MySQL ISAM tables, Postgres is going to be slower because it offers several features that MySQL ISAM tables do not. – Alexandros Aug 12 '14 at 23:56
  • Yes, iv tweaked the checkpoint timeouts and the checkpoint segment sizes, it doesnt seem to help. Our MySQL DB is all InnoDB. They're both running on AWS RDS so it should really already be tuned by AWS to a descent amount. I expected it to be slightly slower, but not 100% slower. Thats a show stopper unfortunately. – Cheyne Aug 13 '14 at 00:04

1 Answers1

1

Why does your job queue contain 2 million rows? Are they all live or are have not moved them to an archive table to keep your reporting more simple?

Have you used EXPLAIN on your SQL from a psql prompt or your preferred SQL IDE/tool?

Postgres is a completely different RDBMS then MySQL. It allocates space differently and manipulates space differently so may need to be indexed differently.

Additionally there's a tool called pgtune that will suggest configuration changes.


edit: 2014-08-13

Also, rails comes with a profiler that might add some insight. Here's a StackOverflow thread about rails profiling.

You also want to watch your DB server at the disk IO level. Does your job fulfillment to a large number of updates? Postgres created new rows when you update a existing rows, and marks the old rows as available, instead of just overwriting the existing row. So you may be seeing a lot more IO as a result of your RDBMS switch.

Community
  • 1
  • 1
Len Jaffe
  • 3,442
  • 1
  • 21
  • 28
  • 2 million rows is roughly 1 months data, which is required for use in the application, we archive later than that. Im curious to know more about when you say it needs to be indexed differently though? How do you mean? I'll check out pgtune, thanks for that – Cheyne Aug 12 '14 at 23:03
  • You use the EXPLAIN [http://www.postgresql.org/docs/9.2/static/sql-explain.html] command in psql to have psql tell you how it plans to execute your query. If you see a lot of sequential scans, you might want to index your data differently. – Len Jaffe Aug 13 '14 at 14:33
  • Are you saying that some part of your app today will need to use data form the job queue from three weeks ago? – Len Jaffe Aug 13 '14 at 14:34
  • Correct. The job status information is blended with other info and trended along event timelines and charts. (a bad job status indicates an unhealthy system) so its important that we keep a month up front for on demand use. – Cheyne Aug 13 '14 at 15:48
  • OK. I'm back to running explain on each step of the job record generation and fulfillment. That's the only real way to know where your database bottleneck is. – Len Jaffe Aug 13 '14 at 15:59
  • Thanks Len, I have a suspicion it could be something to do with the way the data was migrated across, so i might look into that. Thanks for your suggestions, i'll try those out too. I'll mark your response as the answer because you provided a few things I can use. Thanks again – Cheyne Aug 13 '14 at 16:27
  • Keep me posted. I'm fascinated by profiling and tuning exercises like this. – Len Jaffe Aug 13 '14 at 17:49