Basically you would like to specify a quote parameter for your CSV data.
Redshift has 2 ways of specifying external tables (see Redshift Docs for reference):
- using the default built-in SerDes and properties like
ROW FORMAT DELIMITED
, FIELDS TERMINATED BY
- explicitly specifying a SerDe with
ROW FORMAT SERDE
, WITH SERDEPROPERTIES
I don't think it's possible to specify a quote parameter using the built-in SerDes.
It is possible to specify them using org.apache.hadoop.hive.serde2.OpenCSVSerde
(look here for details on it's properties), but beware that there are know problems with it, as one described in this SO question.
Now for Metillion:
I have never used Matillion, but looking at their Redshift External Table documentation page, looks like it's only possible to specify the FORMAT
and the FIELD TERMINATOR
, but not to specify a SerDe and it's properties, hence it's not possible to specify the quote parameters for the external table - unless there are some undocumented means to specify a custom SerDe.
Personal note:
We have experienced many problems with ingesting data stored as CSV, and we basically try to avoid it. There's no standard for CSV, each tool implements it's own version of support for it, and it's very difficult convince all your tools to see data the same way.