1

I'm trying to create an Athena view managed with CloudFormation. This view contains list of nested records property.

Running a SELECT directly in Athena works fine:

SELECT
    item_id AS material_id,
    material_type AS material_type,
    material_group AS material_group,
    material_status AS x_plant_mat_stat,
    products[1].PRODUCT_NO AS product_nr,
    products[1].VERSION AS product_version,
    products[1].SUPPL_CHAIN_OWNERSHIP AS supply_chain_owner,
    products[1].DELETED_DATE AS global_deleted_date,
    transform(
        warehouses,
        plant -> CAST(ROW(
            plant.WAREHOUSE,
            plant.PLANT_SPECIFIC_MAT_STATUS,
            plant.PROCUREMENT_TYPE
        ) AS ROW(plant_id varchar, ps_material_stat varchar, proc_type varchar))
    ) AS plants
FROM raw_item_master LIMIT 5

But when I try following CloudFormation snippet:

    View:
        Type: "AWS::Glue::Table"
        Properties:
            CatalogId: !Ref "AWS::AccountId"
            DatabaseName: !Ref "GlueDatabaseName"
            TableInput:
                TableType: "VIRTUAL_VIEW"
                Name: "item_master"
                Parameters:
                    presto_view: true
                StorageDescriptor:
                    SerdeInfo: {}
                    Columns:
                        -
                            Name: "material_id"
                            Type: "string"
                        -
                            Name: "material_type"
                            Type: "string"
                        -
                            Name: "material_group"
                            Type: "string"
                        -
                            Name: "x_plant_mat_stat"
                            Type: "string"
                        -
                            Name: "product_nr"
                            Type: "string"
                        -
                            Name: "product_version"
                            Type: "string"
                        -
                            Name: "supply_chain_owner"
                            Type: "string"
                        -
                            Name: "global_deleted_date"
                            Type: "string"
                        -
                            Name: "plants"
                            Type: "array<struct<plant_id:string,ps_material_stat:string,proc_type:string>>"
                ViewOriginalText:
                    "Fn::Sub":
                        - "/* Presto View: ${View} */"
                        -
                            View:
                                "Fn::Base64": !Sub '
                                    {
                                        "catalog": "awsdatacatalog",
                                        "schema": "${GlueDatabaseName}",
                                        "columns": [
                                            {
                                                "name": "material_id",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "material_type",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "material_group",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "x_plant_mat_stat",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "product_nr",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "product_version",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "supply_chain_owner",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "global_deleted_date",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "plants",
                                                "type": "array(row(plant_id varchar, ps_material_stat varchar, proc_type varchar))"
                                            }
                                        ],
                                        "originalSql": "SELECT
                                                item_id AS material_id,
                                                material_type AS material_type,
                                                material_group AS material_group,
                                                material_status AS x_plant_mat_stat,
                                                products[1].PRODUCT_NO AS product_nr,
                                                products[1].VERSION AS product_version,
                                                products[1].SUPPL_CHAIN_OWNERSHIP AS supply_chain_owner,
                                                products[1].DELETED_DATE AS global_deleted_date,
                                                transform(
                                                    warehouses,
                                                    plant -> CAST(ROW(
                                                        plant.WAREHOUSE,
                                                        plant.PLANT_SPECIFIC_MAT_STATUS,
                                                        plant.PROCUREMENT_TYPE
                                                    ) AS ROW(plant_id varchar, ps_material_stat varchar, proc_type varchar))
                                                ) AS plants
                                            FROM ${RawTable}"
                                    }'

I got a following error in Athena:

INVALID_VIEW: Invalid view JSON: # here comes my JSON

However when I select just one property it works fine (field type as "type": "array(row(plant_id varchar))", transform as CAST(ROW(plant.WAREHOUSE) AS ROW(plant_id varchar)). View works with any property, but only one - as soon as I add two properties it break in Athena.

Rafał Wrzeszcz
  • 1,996
  • 4
  • 23
  • 45

1 Answers1

1

After creating view from Athena and extracting it with aws glue get-table I compared my input with Athena output and the only different were whitespaces in column definition.

My input (spaces after commas):

"type": "array(row(plant_id varchar, ps_material_stat varchar, proc_type varchar))"

Athena (no whitespaecs):

"type": "array(row(plant_id varchar,ps_material_stat varchar,proc_type varchar))"

After removing whitespaces it worked!

Rafał Wrzeszcz
  • 1,996
  • 4
  • 23
  • 45