2

I am trying to create an Athena table for s3 server access logs in my python cdk code with the help of this AWS link: https://aws.amazon.com/premiumsupport/knowledge-center/analyze-logs-athena/

The table gets created successfully from the Athena console but gives a - no viable alternative at input "CREATE EXTERNAL" error while I try to do the same using the Python code.

I am not able to figure out the problem here This is my python code:

query_string="""CREATE EXTERNAL TABLE IF NOT EXISTS bucket_logs_db.access_logs(
                                            BucketOwner STRING,
                                            Bucket STRING,
                                            RequestDateTime STRING,
                                            RemoteIP STRING,
                                            Requester STRING,
                                            RequestID STRING,
                                            Operation STRING,
                                            Key STRING,
                                            RequestURI_operation STRING,
                                            RequestURI_key STRING,
                                            RequestURI_httpProtoversion STRING,
                                            HTTPstatus STRING,
                                            ErrorCode STRING,
                                            BytesSent BIGINT,
                                            ObjectSize BIGINT,
                                            TotalTime STRING,
                                            TurnAroundTime STRING,
                                            Referrer STRING,
                                            UserAgent STRING,
                                            VersionId STRING,
                                            HostId STRING,
                                            SigV STRING,
                                            CipherSuite STRING,
                                            AuthType STRING,
                                            EndPoint STRING,
                                            TLSVersion STRING
                                        )
                                        ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
                                        WITH SERDEPROPERTIES (
                                                 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\]
                                                 ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\"
                                                 (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\")
                                                 ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$' )
                                        LOCATION 's3://' + f'{bucket.bucket_name}' + /prefix'"""
dtolnay
  • 9,621
  • 5
  • 41
  • 62
s_neenu
  • 95
  • 2
  • 7
  • 2
    Probably the issue is with how am using ```LOCATION 's3://' + f'{bucket.bucket_name}' + /prefix'```. Suggestions? – s_neenu Jul 30 '20 at 10:23
  • yes looks like your location seems to be invalid. Try printing out the s3://' + f'{bucket.bucket_name}' + /prefix before passing to query to see if it is passing valid S3 lcoation? – Prabhakar Reddy Jul 30 '20 at 15:58
  • Bucket location is valid, I doubt if concatenating like this works for a string in python.. – s_neenu Jul 31 '20 at 05:44
  • Can you just print this concatenated string alone to see if it is creating valid S3 string? – Prabhakar Reddy Jul 31 '20 at 05:56
  • Take a look at this answer : https://github.com/aws/aws-cdk/issues/3665#issuecomment-610604199 – Amit Baranes Jul 31 '20 at 09:55
  • 1
    The code works if I hardcode the bucket location. Can someone suggest how should I give the location here? I want to fetch the bucket name from s3 bucket and put it into this python string as in code. – s_neenu Aug 02 '20 at 14:43

0 Answers0