8

I am working with the AWS Aurora PostgreSQL 10.4 engine. I am trying to cluster table ... using index and getting an error like

could not write block .... of temporary file: no space left on device

If I were managing my own PostgreSQL instance I would be looking at the space available on individual volumes with df. (See also: I get an error "could not write block .... of temporary file no space left on device ..." using postgresql)

But with Aurora, AWS should be managing the storage and automatically expanding it on demand. So I'm wondering how I would go about fixing this condition if I'm not managing the storage myself. I'm guessing that that the PG engine's temp storage is separate from the Aurora-managed virtualized storage layer, but not sure how to change it.

wrschneider
  • 17,913
  • 16
  • 96
  • 176

2 Answers2

7

Temp space uses the local “ephemeral” volume on the instance. Currently the only way to increase that space is to move to a larger instance size.

Hal Berenson
  • 351
  • 2
  • 5
  • Is there any metrics available in CloudWatch to monitor the storage? – Spike Apr 19 '19 at 03:53
  • 1
    CloudWatch->RDS->DBClusterIdentifier->FreeLocalStorage – perimeno Jul 09 '20 at 08:42
  • 3
    Too bad it happens also in Aurora Serverless. – Czechnology Feb 06 '21 at 08:06
  • Is this answer still valid in 2021? – Paulo Brito Jun 22 '21 at 19:09
  • 1
    @PauloBrito Hi from 2022. If you use aurora clusters or serverless, you don't have to move to a larger instance size per se, IME, but you still get this error sometimes. The FreeLocalStorage graph looks like a bunch of jitter, because the upsize job is trying to catch up. Wait a bit, and retry, and (again, IME) often we're fine after that. But it's an unfortunate failure nevertheless. – kojiro Oct 10 '22 at 14:37
  • Hi, @kojiro! Thanks for sharing your experiences. Just for the record, we solved our issue by strictly controlling the size of the insert batch scripts we were sending to Aurora, as part of our ETL setup. – Paulo Brito Dec 29 '22 at 00:47
1

You're right in stating that Aurora should take care of this. If you have multiple instances in your cluster, then your cluster would self recover by initiating a failover. The faulty instance would be repaired in the background as well - mostly automatically, and in some rare cases by AWS operators.

If you noticed the issue persisting for more than a few minutes, then you should:

  1. Manually trigger a failover using API/Console
  2. Engage AWS Support to look into the matter if it happens more often.

If you think AWS missed your SLA, then do bring it to their notice as well.

AWS will use commercially reasonable efforts to make Multi-AZ instances available with a Monthly Uptime Percentage (defined below) of at least 99.95% during any monthly billing cycle (the "Service Commitment"). In the event Amazon RDS does not meet the Monthly Uptime Percentage commitment, you will be eligible to receive a Service Credit

Doc: https://aws.amazon.com/rds/sla/

The-Big-K
  • 2,672
  • 16
  • 35