I have created a table in AWS Athena like this:
CREATE EXTERNAL TABLE IF NOT EXISTS default.test_line_breaks (
col1 string,
col2 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
STORED AS TEXTFILE
LOCATION 's3://bucket/test/'
In the bucket I put a simple CSV file with the following context:
rec1 col1,rec2 col2
rec2 col1,"rec2, col2"
rec3 col1,"rec3
col2"
When I run data preview request SELECT * FROM "default"."test_line_breaks" limit 10;
then Athena returns the following response:
How should I set ROW FORMAT
to properly handle line breaks within the field values? So that rec3\ncol2
appears in col2
.