Amazon S3 has a new feature called select from
which allows one to run simple SQL queries against simple data files - like CSV or JSON. So I thought I'd try it.
I created and uploaded the following CSV to my S3 bucket in Oregon (I consider this file to be extremely simple):
aaa,bbb,ccc
111,111,111
222,222,222
333,333,333
I indicated this was CSV with a header row and issued the following SQL:
select * from s3object s
...which worked as expected, returning:
111,111,111
222,222,222
333,333,333
Then I tried one of the provided sample queries, which failed:
select s._1, s._2 from s3object s
...the error message was "Some headers in the query are missing from the file. Please check the file and try again.".
Also tried the following, each time receiving the same error:
select aaa from s3object s
select s.aaa from s3object s
select * from s3object s where aaa = 111
select * from s3object s where s.aaa = 111
select * from s3object s where s._1 = 111
So anytime my query references a column, either by name or number, either in the SELECT or WHERE clauses, I get the "headers in the query are missing". The AWS documentation provides no follow up information on this error.
So my question is, what's wrong? Is there an undocumented requirement about the column headers? Is there an undocumented way to reference columns? Does the "Select From" feature have a bug in it?