3

I executed the below query

select employee_name, max(employee_dept) as dept
from employeeDB 
where employee_name is not null and employee_name != ''
group by employee_name
order by employee_name asc
limit 1000

and received the error ERROR: XX000: Disk Full.

upon investigation by executing the below query i found that i have 941 GB free space and 5000 GB used space.

select
  sum(capacity)/1024 as capacity_gbytes, 
  sum(used)/1024 as used_gbytes, 
  (sum(capacity) - sum(used))/1024 as free_gbytes 
from 
  stv_partitions where part_begin=0;

Can anyone suggest how to mitigate this challenge so that i can get the desired results?

StatguyUser
  • 2,595
  • 2
  • 22
  • 45
  • What size is your employeeDB table? You can check the table size from svv_table_info. What is the compression encodings you are using in the mentioned column? My assumption is, it has to do something with the compression. Your data would not fit uncompressed on the disk. – Rahul Gupta Mar 13 '17 at 14:12

2 Answers2

2

Free disk space is important for query execution on Redshift. That's why the VACUUM process is important and should be executed regularly, especially for tables where deletions happen often.

Have you VACUUMed your tables lately?

Check the VACUUM documentation and also look at the Amazon Redshift at 100% disk usage due to VACUUM query question on StackOverflow.

Community
  • 1
  • 1
cpard
  • 330
  • 1
  • 7
2
+-------+              +-------+
|-------|              |-------|
||10 kb||              ||25 kb||
+-------+              +-------+
|xxxxxxx|              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
|xxxxxxx+------------->+xxxxxxx|
+-------+              |xxxxxxx|
||10 kb||              |xxxxxxx|
+-------+              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
+-------+              |xxxxxxx|
||05 kb||              |xxxxxxx|
+-------+              +-------+

Look at the representation above. Let us assume that xxxxxxx represents the occupied space on the disk while the numbers represent the empty space available.

Both the scenarios represent a vacant space of 25 kb. But in case 1, if you have to insert (or perform operations) that would require a contiguous memory allocation of, say 15 kb, you won't be able to do that. Although a space of 25 kb is available, but since that isn't contiguous, you might get a Memory / Disk Full Error and thus either the space will go waste or will be assigned for tasks that are very low on memory requirement.

In case 2, a block of contiguous memory is available. A task requiring ~25kb of memory can easily be executed,

This isn't only with Redshift or DBMS; it holds true with anything that remotely involves memory management, including Operating Systems.

What causes such memory partitions (called Fragmentation)?

Fragmentation is caused by continuously creating and deleting (modifying) files on disk. When a file occupying a space is removed, it creates a gaping memory hole there. A file of size less than the memory hole can occupy that space or the space will go waste.

What should be done?

Defragment! In your specific case, Amazon Redshift provides the provision to VACUUM tables and/or schema. You might be having enough disk space, but not enough contiguous memory that the engine would be able to allocate to the task executed by you.

Yusuf Hassan
  • 1,933
  • 1
  • 12
  • 21
  • I am just querying data and later i will export the result as csv. Can you suggest how to fix this issue so that i can query data efficiently? – StatguyUser Mar 15 '17 at 10:38
  • @Enthusiast: Simply try executing the above query on a different machine, preferably having better configuration than the one you are currently using. – Yusuf Hassan Mar 15 '17 at 19:35
  • @Enthusiast: Please mark the answer above as accepted if it's exactly what you were looking for. If not exactly, but helped you in someway, upvote it. This way, it'll not be lost and would help them who have similar issues. – Yusuf Hassan Dec 16 '17 at 15:07