4

trying to determine the best ways to deploy some Iceberg tables into our AWS environment. Has anyone had success via Terraform? I have the following configuration, but Athena complains of lacking metadata location (or will just spin forever) when I query the table I created. If there are better ways that can be automated for deployment, please let me know as this could just be a limit in the provider.

resource "aws_glue_catalog_table" "iceberg-table" {

  name          = "sales_header"
  database_name = "sales"

  # Governed tables require this value in uppercase.
  parameters = {
    "table_type" = "ICEBERG"
    "format"     = "parquet"
  }

  storage_descriptor {
    location      = "s3://${var.s3_raw_bucket}/sales/sales_header/"

    columns {
      name    = "transaction_id"
      type    = "integer"
      comment = ""
    }
    columns {
      name    = "sale_date"
      type    = "integer"
      comment = ""
    }
    columns {
      name    = "sale_amount"
      type    = "float"
      comment = ""
    }
  }
}

Tried running the attached code block, presented with errors. Other in my org have hit the same issues, hoping for some wizards in the aether to help.

  • Does this help: https://aws.amazon.com/blogs/big-data/build-a-real-time-gdpr-aligned-apache-iceberg-data-lake/? More as an example than a complete solution and maybe there are steps you need to undertake to make it work. – Marko E Feb 27 '23 at 14:58
  • 1
    No, that's a manual deployment in their article. Our teams aren't allowed to manually deploy to higher environments, it needs to be IaC or programmatic. – DataEnginerd Feb 27 '23 at 17:49
  • I meant do you need to activate the Iceberg connector for Glue from the Marketplace first, not that you do it manually. – Marko E Feb 28 '23 at 11:13
  • If I don't remember wrong I tried the same thing and it complains about lacking metadata location because the glue table defined points to a metadata file that was not created in S3. I guess the terraform provider is not prepared for building the `.metadata.json` file. – pauetpupa Mar 06 '23 at 16:04
  • @pauetpupa - exactly one of the errors I've gotten. From what I've seen it looks like Athena and possibly the CLI are the only viable options for deploying Iceberg tables right now. – DataEnginerd Mar 07 '23 at 14:58

0 Answers0