We've been running this DMS replication instance for ~2 months now, and from time to time it will fail due to the following error:
Last Error Replication task out of memory. Stop Reason FATAL_ERROR Error Level FATAL.
Only 1 task is running which captures full Load + ongoing changes
from 5 tables (~5M rows) in RDS Postgres into an S3 bucket. The replication instance is an dms.r5.xlarge
on DMS version 3.4.3. Everytime it fails we just upgrade the instance size (according to solutions seen online), but I'm afraid that this problem will just persist.
Based on the cloudwatch metrics, the free memory, freeable memory, and available memory are all decreasing over time until it crashes. The swap usage remains consistent at zero until right before it crashes, it spikes.
Would someone know what this might be, are we just using an instance size that's too small? I feel like our configuration is pretty basic.