I have json data that looks something like:
{ "col1" : 123, "metadata" : { "opt1" : 456, "opt2" : 789 } }
where the various metadata fields (of which there are many) are optional and may or may not be present.
My query is:
select col1, metadata.opt1 from "db-name".tablename
If opt1
is not present in any rows, I would expect this to return all rows with a blank for the opt1
column, but if there wasn't a row with the opt1
in metadata
when the crawler ran (and might still not be present in data when the query is run, as it's optional), the query fails, with:
SYNTAX_ERROR: line 2:1: Column '"metadata"."opt1"' cannot be resolved
I could specify these fields manually either in the schema definition (if I don't use a crawler), but then it wouldn't pick up any new metadata fields that may arrive, and specifying a static schema doesn't seem to be in the spirit of how Athena is supposed to work.
How do I get this to function as expected (preferably without putting dummy rows in or customizing the SerDe)?
Using SerDe org.openx.data.jsonserde.JsonSerDe
at present.
Thanks for any ideas.