everyone. I have a nested json object. I am trying to create a table which I will then query. I am struggling to see where I could be going wrong. I have tried as was sugested in this post and followed this tutorial and have yet to create a table with actual readable data.
[{
"player": "Charlie",
"club": {
"position": "Attacking Midfield",
"competitor": "Bardsley",
"offense": [{
"shots": 13,
"goals": 1,
"close_range": 3
"fouls_against": 2
}, {
"shots": 13,
"goals": 1,
"close_range": 3
"fouls_against": 2
}
],
"defense": [{
"tackle": 0,
"interception": 1,
"blocked_shots": 0
"fouls": 5
}, {
"tackle": 3,
"interception": 4,
"blocked_shots": 3
"fouls": 6
}
],
},
"training_schedule": [
{
"training_name": "Piggy in the middle",
"coach": "Grant Wool"
"training_start": "2008-03-02T14:00:00.000Z"
}, {
"training_name": "Weight training",
"coach": "John Smith"
"training_start": "2008-03-02T16:00:00.000Z"
}, {
"training_name": "Tactical Video Session",
"coach": "Eusebius Pontiff"
"training_start": "2008-03-02T18:00:00.000Z"
}, {
"training_name": "Cross Country Run",
"coach": "John Smith"
"training_start": "2008-03-04T12:00:00.000Z"
}, {
"training_name": "Offensive Possession Play",
"coach": "Grant Wool"
"training_start": "2008-03-04T16:00:00.000Z"
}, {
"training_name": "Attacking Set Pieces",
"coach": "Grant Wool"
"training_start": "2008-03-05T12:00:00.000Z"
}, {
"training_name": "Practice game (6 a side)",
"coach": "Grant Wool"
"training_start": "2008-03-05T14:00:00.000Z"
}
]
}]
As you can see this is a nested json with all kinds of goodness. I am trying to create a table using this data to find the best players for the weekend. The problem I have is that when I load this data and attempt to create the table it fails with a none too clear message as to why. Here is what I have tried on AWS Athena:
CREATE EXTERNAL TABLE footie.players(
player array<struct<
player: string,
game_stats struct<
position: string,
competitor: string,
offense: array<struct<shots: int, goals: int, close_range: int, fouls_against: int>>,
defense: array<struct<tackle: int, interception: int, blocked_shots: int, fouls: int>>
>,
training_schedule: array<struct<
training_name: string,
coach: string
training_start: string>
>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'paths'='array')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://myprojects/footie.json'
I keep getting "service: amazonathena; status code: 400; error code: invalidrequestexception". The crawler is just as bad giving me empty rows of data. I'm at a loss whether I should try changing the file format as has been suggested in other posts and if so what is the correct format I should be going for?