1

Basing on that documentation page:

https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-data-optimization.html

For having following Apache Iceberg table:

CREATE TABLE IF NOT EXISTS my_catalog.my_database.my_table (
   id string,
   createdAt timestamp)
PARTITIONED BY (bucket(16, `id`))
LOCATION 's3://my_bucket/iceberg/my_database/my_table'
TBLPROPERTIES ('table_type'='iceberg', 'format-version' = '2', 'vacuum_max_snapshot_age_seconds' = '2419200');

I should be able to simply run following statements to vacuum and optimise it:

OPTIMIZE "my_database"."my_table" REWRITE DATA USING BIN_PACK WHERE catalog = 'my_catalog';

VACUUM "my_database"."my_table";

However instead I'm getting an error from Athena engine for VACUUM:

line 3:1: mismatched input 'VACUUM'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE', 'EXPLAIN', 'GRANT', 'INSERT', 'PREPARE', 'RESET', 'REVOKE', 'ROLLBACK', 'SET', 'SHOW', 'START', 'UNLOAD', 'UPDATE', 'USE',

And for OPTIMIZE:

line 1:1: mismatched input 'OPTIMIZE'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE', 'EXPLAIN', 'GRANT', 'INSERT', 'PREPARE', 'RESET', 'REVOKE', 'ROLLBACK', 'SET', 'SHOW', 'START', 'UNLOAD', 'UPDATE', 'USE',


I don't understand why it doesn't allow VACUUM and OPTIMIZE, when it's explained in docs, that it supports that.

PatrykMilewski
  • 922
  • 8
  • 17

1 Answers1

1

By default (as for 03.2023), Athena is using v2 engine. This can be changed in Athena workgroup settings:

enter image description here

Because for my case, it was stuck on "Pending automatic upgrade". You have to simply edit workgroup, select option to manually pick version and then finally select v3.

The second issue was about the query itself, while this works fine:

SELECT * FROM "my_database"."my_table"; 

It won't work for case with OPTIMIZE and VACUUM because of quotes. Running it this way will fix the issue:

OPTIMIZE my_database.my_table REWRITE DATA USING BIN_PACK;

VACUUM my_database.my_table;

And also for OPTIMIZE it's optional to specify the catalog name.

PatrykMilewski
  • 922
  • 8
  • 17