2

We've been running this DMS replication instance for ~2 months now, and from time to time it will fail due to the following error:

Last Error Replication task out of memory. Stop Reason FATAL_ERROR Error Level FATAL.

Only 1 task is running which captures full Load + ongoing changes from 5 tables (~5M rows) in RDS Postgres into an S3 bucket. The replication instance is an dms.r5.xlarge on DMS version 3.4.3. Everytime it fails we just upgrade the instance size (according to solutions seen online), but I'm afraid that this problem will just persist.

Based on the cloudwatch metrics, the free memory, freeable memory, and available memory are all decreasing over time until it crashes. The swap usage remains consistent at zero until right before it crashes, it spikes.

Freeable Memory

Would someone know what this might be, are we just using an instance size that's too small? I feel like our configuration is pretty basic.

Ashley
  • 21
  • 1
  • 3
  • What LOB mode are you using, do these tables have large object columns and how much data do you estimate is in the data as well as what type of throughput are you expecting based on the changes to the database and ongoing replication. How long does it take before it crashes typically, is the memory consumption a smooth decent down or is it more spikey? Does it fail during the full load or later during ongoing changes? – JD D Mar 22 '21 at 00:16
  • @JDD We left it on the default Limited LOB mode, although I've checked our Postgres DB, and we don't have any LOBs. We're looking to gradually setup tasks to migrate the entire DB which sits at around ~300GB, where we have a pipeline that ingests the data from S3 at an hourly rate. The throughput right now is great, it's just the memory that crashes. It'll smoothly decrease over time (some larger drops here and there) until it runs out after ~7 days or so while it replicates ongoing changes. Increasing the instance size has just given us more runway before it crashes. – Ashley Mar 22 '21 at 14:19
  • Post your replication settings! If you changed InlineLobMaxSize or lobChunkSize, you're in for a world of hurt. – John Jones Jan 31 '23 at 23:41
  • Does this answer your question? [AWS DMS replication instance out of memory](https://stackoverflow.com/questions/56099745/aws-dms-replication-instance-out-of-memory) – John Jones Jan 31 '23 at 23:42

1 Answers1

0

It may be good to review the following article:

https://aws.amazon.com/blogs/database/debugging-your-aws-dms-migrations-what-to-do-when-things-go-wrong-part-1/

In general, ongoing replication is a memory intensive operation. You may need to increase your memory footprint of your replication instance appropriately.

Another strategy suggested would be to break up the replication across multiple replication instances since you have 5 tables, you could potentially create a set of smaller replication instances that each handle a single or smaller set of tables.

JD D
  • 7,398
  • 2
  • 34
  • 53
  • Thank you for this! I did in fact find that article awhile back and couldn't figure out a solution for what we're seeing. I'll also look into the solution you suggested, and see how the instances perform with different sets of tables. – Ashley Mar 22 '21 at 14:21