1

I have my data in CSV format in the below form:

Id -> tinyint

Name -> String

  Id  Name
  1   Alex
  2   Sam

When I export the CSV file to S3 and create an Athena table, the data transform into the following format.

 Id   Name
 1    "Alex"
 2    "Sam"

How do I get rid of the double quotes while creating the table?

Any help is appreciated.

dtolnay
  • 9,621
  • 5
  • 41
  • 62
AswinRajaram
  • 1,519
  • 7
  • 18

1 Answers1

1

By default if SerDe is not specified, Athena is using LasySimpleSerDe, it does not support quoted values and reads quotes as a part of value. If your CSV file contains quoted values, use OpenCSVSerde (specify correct separatorChar if it is not comma):

CREATE EXTERNAL TABLE mytable(
  id   tinyint,
  Name string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
 'separatorChar' = ',',
   'quoteChar' = '\"', 
   'escapeChar' = '\\'
)
LOCATION 's3://my-bucket/mytable/' 
;

Read the manuals: https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html See also this answer about data types in OpenCSVSerDe

leftjoin
  • 36,950
  • 8
  • 57
  • 116