I have created a table to perform queries from S3 data(JSON format, which is the output from SageMaker) using Athena.
The file extension is: "filename.json.out". The formatting is in "JSON" format.
The files are coming from the SageMaker batch transform job. The table that I have been trying to create has the format like this:
the "col10" key is basically an array of string input like "col10": ["value1", "value2"]
Now, I've tried to create a table in a database like below:
CREATE EXTERNAL TABLE IF NOT EXISTS table_name (
`data` array<struct<
`col1`: string,
`col2`: string,
`col3`: string,
`col4`: string,
`col5`: string,
`col6`: string,
`col7`: string,
`col8`: string,
`col9`: string,
`col10`: array,
`col11`: string,
`col12`: string,
`col13`: string,
`col14`: string,
`col15`: string
>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://bucket_name/object/'
TBLPROPERTIES ('has_encrypted_data'='false');
After creating the table, I tried to see the records by using the query below:
select * from db_name.table_name limit 10
Whenever I try to run the "select" query above, I find the error:
I have followed the links below:
Stackoverflow: How to Create Tables AWS Athena --> Mappings Json Array?
Stackoverflow: https://stackoverflow.com/a/50411984/9409770
And some others that I have forgotten. This problem is driving me crazy.
I have tried querying the same data format on a single file using "S3 Select", it worked without any problem. In Athena, I cannot make it work. I am sure I am making a silly mistake here. However, I cannot figure out where. I am not that expert in this field.
If there is any cheaper alternative to get data programmatically from S3 without incurring much cost, please share this.
Thank you.