1

Is it possible to create an Athena view via cloudformation template. I can create the view using the Athena Dashboard but I want to do this programmatically using CF templates. Could not find any details in AWS docs so not sure if supported.

Thanks.

Infinite
  • 704
  • 8
  • 27

4 Answers4

3

It is possible to create views with CloudFormation, it's just very, very, complicated. Athena views are stored in the Glue Data Catalog, like databases and tables are. In fact, Athena views are tables in Glue Data Catalog, just with slightly different contents.

See this answer for the full description how to create a view programmatically, and you'll get an idea for the complexity: Create AWS Athena view programmatically – it is possible to map that to CloudFormation, but I would not recommend it.

If you want to create databases and tables with CloudFormation, the resources are AWS::Glue::Database and AWS::Glue::Table.

Theo
  • 131,503
  • 21
  • 160
  • 205
2

In general, CloudFormation is used for deploying infrastructure in a repeatable manner. This doesn't apply much to data inside a database, which typically persists separately to other infrastructure.

For Amazon Athena, AWS CloudFormation only supports:

  • Data Catalog
  • Named Query
  • Workgroup

The closest to your requirements is Named Query, which (I think) could store a query that can create the View (eg CREATE VIEW...).

See: AWS::Athena::NamedQuery - AWS CloudFormation

Update: @Theo points out that AWS CloudFormation also has AWS Glue functions that include:

  • AWS::Glue::Table

This can apparently be used to create a view. See comments below.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • With Named Query , I can store the query but how do I kick/execute it off from CF template ? – Infinite Sep 24 '20 at 08:10
  • 1
    I don't think you can. CloudFormation is designed to deploy infrastructure, not "data". You could create an AWS Lambda function that connects to Amazon Athena and creates the View, but I still think this is not a clever use of CloudFormation since it is not really "deploying" anything and, when the stack is deleted, I suspect you do not want it to "tear down" the view. – John Rotenstein Sep 24 '20 at 08:41
  • This answer is not correct. You can create Athena databases and tables with CloudFormation, they just happen to be in the Glue namespace since Athena uses Glue Data Catalog for such things. See [AWS::Glue::Table](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-glue-table.html) for example. Views are just a special variant of tables, and can also be created with CloudFormation, although in practice it's very hard. – Theo Sep 24 '20 at 17:01
2

I think for now the best way to create Athena view from CloudFormation template is to use Custom resource and Lambda. We have to supply methods for View creation and deletion. For example, using crhelper library Lambda could be defined:

from __future__ import print_function
from crhelper import CfnResource
import logging
import os
import boto3

logger = logging.getLogger(__name__)
helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL', sleep_on_delete=120)

try:
    client = boto3.client('athena')
    ATHENA_WORKGROUP = os.environ['athena_workgroup']
    DATABASE = os.environ['database']
    QUERY_CREATE = os.environ['query_create']
    QUERY_DROP = os.environ['query_drop']
except Exception as e:
    helper.init_failure(e)

@helper.create
@helper.update
def create(event, context):
    logger.info("View creation started")

    try:
        executionResponse = client.start_query_execution(
            QueryString=QUERY_CREATE,
            QueryExecutionContext={'Database': DATABASE},
            WorkGroup='AudienceAthenaWorkgroup'
        )
        logger.info(executionResponse)

        response = client.get_query_execution(QueryExecutionId=executionResponse['QueryExecutionId'])
        logger.info(response)

        if response['QueryExecution']['Status']['State'] == 'FAILED':
            logger.error("Query failed")
            raise ValueError("Query failed")

        helper.Data['success'] = True
        helper.Data['id'] = executionResponse['QueryExecutionId']
        helper.Data['message'] = 'query is running'

    except Exception as e:
        print(f"An exception occurred: {e}")

    if not helper.Data.get("success"):
        raise ValueError("Creating custom resource failed.")

    return


@helper.delete
def delete(event, context):
    logger.info("View deletion started")

    try:
        executionResponse = client.start_query_execution(
            QueryString=QUERY_DROP,
            QueryExecutionContext={'Database': DATABASE},
            WorkGroup='AudienceAthenaWorkgroup'
        )
        logger.info(executionResponse)

    except Exception as e:
        print("An exception occurred")
        print(e)

@helper.poll_create
def poll_create(event, context):
    logger.info("Pol creation")

    response = client.get_query_execution(QueryExecutionId=event['CrHelperData']['id'])

    logger.info(f"Poll response: {response}")

    # There are 3 types of state of query
    # if state is failed - we stop and fail creation
    # if state is queued - we continue polling in 2 minutes
    # if state is succeeded - we stop and succeed creation
    if 'FAILED' == response['QueryExecution']['Status']['State']:
        logger.error("Query failed")
        raise ValueError("Query failed")

    if 'SUCCEEDED' == response['QueryExecution']['Status']['State']:
        logger.error("Query SUCCEEDED")
        return True

    if 'QUEUED' == response['QueryExecution']['Status']['State']:
        logger.error("Query QUEUED")
        return False

    # Return a resource id or True to indicate that creation is complete. if True is returned an id
    # will be generated
    # Return false to indicate that creation is not complete and we need to poll again
    return False

def handler(event, context):
    helper(event, context)

The Athena queries for view creation/updation/deletion are passed as environmental parameters to Lambda. In CloudFormation template we have to define the Lambda that invokes mentioned Python code and creates/updates/deletes Athena view. For example

  AthenaCommonViewLambda:
    Type: 'AWS::Lambda::Function'
    DependsOn: [CreateAthenaViewLayer, CreateAthenaViewLambdaRole]
    Properties:
      Environment:
        Variables:
          athena_workgroup: !Ref AudienceAthenaWorkgroup
          database:
            Ref: DatabaseName
          query_create: !Sub >-
            CREATE OR REPLACE VIEW ${TableName}_view AS
            SELECT field1, field2, ...
            FROM ${DatabaseName}.${TableName}
          query_drop: !Sub DROP VIEW IF EXISTS ${TableName}_common_view
      Code:
        S3Bucket: !Ref SourceS3Bucket
        S3Key: createview.zip
      FunctionName: !Sub '${AWS::StackName}_create_common_view'
      Handler: createview.handler
      MemorySize: 128
      Role: !GetAtt CreateAthenaViewLambdaRole.Arn
      Runtime: python3.8
      Timeout: 60
      Layers:
        - !Ref CreateAthenaViewLayer

  AthenaCommonView:
    Type: 'Custom::AthenaCommonView'
    Properties:
      ServiceToken: !GetAtt AthenaCommonViewLambda.Arn
Ihor Konovalenko
  • 1,298
  • 2
  • 16
  • 21
0

Maybe this snippet helps:

SomeView:
    Type: AWS::Glue::Table
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref Database
      TableInput:
        Name: v_some_table
        TableType: VIRTUAL_VIEW
        Parameters: {"presto_view": "true", "comment": "Presto View"}
        StorageDescriptor: 
          SerdeInfo:
            SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
          Columns:
            - Name: column1
              Type: bigint
            - Name: colmb2
              Type: int
        ViewExpandedText: '/* Presto View */'
        ViewOriginalText: !Join
          - ''
          - - '/* Presto View: '
            - Fn::Base64: !Sub |
                {
                  "originalSql": "SELECT\n  \"column1\"\n, \"column2\"\n\nFROM\n  some_table\n",
                  "catalog": "catalog",
                  "schema": "${Database}",
                  "columns": [
                    {
                      "name": "column1",
                      "type": "bigint"
                    },
                    {
                      "name": "column2",
                      "type": "integer"
                    },
                  ]
                }
            - ' */'
Roelant
  • 4,508
  • 1
  • 32
  • 62