i'm not able to query S3 files with Aws Athena, the content of the files are regular json arrays like this:
[
{
"DataInvio": "2020-02-06T13:37:00+00:00",
"DataLettura": "2020-02-06T13:35:50+00:00",
"FlagDownloaded": 0,
"GUID": "f257c9c0-b7e1-4663-8d6d-97e652b27c10",
"IMEI": "866100000062167",
"Id": 0,
"IdSessione": "4bd169ff-307c-4fbf-aa63-fce972f43fa2",
"IdTagLocal": 0,
"SerialNumber": "142707160028BJZZZZ",
"Tag": "E200001697080089188056D2",
"Tipo": "B",
"TipoEvento": "L",
"TipoSegnalazione": 0,
"TipoTag": "C",
"UsrId": "10642180-1e34-44ac-952e-9cb3e8e6a03c"
},
{
"DataInvio": "2020-02-06T13:37:00+00:00",
"DataLettura": "2020-02-06T13:35:50+00:00",
"FlagDownloaded": 0,
"GUID": "e531272e-465c-4294-950d-95a683ff8e3b",
"IMEI": "866100000062167",
"Id": 0,
"IdSessione": "4bd169ff-307c-4fbf-aa63-fce972f43fa2",
"IdTagLocal": 0,
"SerialNumber": "142707160028BJZZZZ",
"Tag": "E200341201321E0000A946D2",
"Tipo": "B",
"TipoEvento": "L",
"TipoSegnalazione": 0,
"TipoTag": "C",
"UsrId": "10642180-1e34-44ac-952e-9cb3e8e6a03c"
}
]
a simple query select * from mytable
returns empty rows if the table has been generated in this way
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable (
`IdSessione` string,
`DataLettura` date,
`GUID` string,
`DataInvio` date
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'ignore.malformed.json' = 'true'
) LOCATION 's3://athenatestsavino/files/anthea/'
TBLPROPERTIES ('has_encrypted_data'='false')
or it gives me an error HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]
if the table has been generated with:
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable(
`IdSessione` string,
`DataLettura` date,
`GUID` string,
`DataInvio` date
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://athenatestsavino/files/anthea/'
TBLPROPERTIES ('has_encrypted_data'='false')
if i modify the content of the file in this way (an json object each rows without trailing commas, the query gives me results)
{ "DataInvio": "2020-02-06T13:37:00+00:00", "DataLettura": "2020-02-06T13:35:50+00:00",....}
{ "DataInvio": "2020-02-07T13:37:00+00:00", "DataLettura": "2020-02-06T13:35:50+00:00",....}
How to query json array structures directly?