How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

Question

I am trying to read csv data from s3 bucket and creating a table in AWS Athena. My table when created was unable to skip the header information of my CSV file.

Query Example :

CREATE EXTERNAL TABLE IF NOT EXISTS table_name (   `event_type_id`
     string,   `customer_id` string,   `date` string,   `email` string )
     ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
     WITH
     SERDEPROPERTIES (   "separatorChar" = "|",   "quoteChar"     = "\"" )
     LOCATION 's3://location/' 
     TBLPROPERTIES ("skip.header.line.count"="1");

skip.header.line.count doesn't seem to work. But this does not work out. I think Aws has some issue with this.Is there any other way that I could get through this?

I just tried this today in Athena, and it seems to work as expected. — Shadi, Nov 18 '19 at 08:41

score 9 · Accepted Answer · answered Dec 08 '17 at 22:52

9

This is what works in Redshift:

You want to use table properties ('skip.header.line.count'='1') Along with other properties if you want, e.g. 'numRows'='100'. Here's a sample:

create external table exreddb1.test_table
(ID BIGINT 
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');

answered Dec 08 '17 at 22:52

TheWalkingData

1,007
1
12
11

Here's AWS Redshift SQL documentation on "Create External Table", http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html – TheWalkingData Dec 11 '17 at 15:58

score 1 · Answer 2 · answered Aug 03 '17 at 23:11

1

This is a known deficiency.

The best method I've seen was tweeted by Eric Hammond:

...WHERE date NOT LIKE '#%'

This appears to skip header lines during a Query. I'm not sure how it works, but it might be a method for skipping NULLs.

answered Aug 03 '17 at 23:11

John Rotenstein

241,921
22
380
470

score 0 · Answer 3 · answered Nov 18 '19 at 08:40

0

As of today (2019-11-18), the query from the OP seems to work. i.e. skip.header.line.count is honored and the first line is indeed skipped.

answered Nov 18 '19 at 08:40

Shadi

9,742
4
43
65

How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

3 Answers3

Linked