1

I have a scenario for creating spectrum table in redshift using matillion. my CSV file data is like this:-

column1,column2,column3
abc,"qwety,pqr",xyz

but in spectrum table i am seeing data as

column1 column2 column3
abc qwerty pqr

Matillion is not taking quotes value as one. can you please suggest how to achieve this using matillion's EXTERNAL TABLE component.

Babbi
  • 65
  • 1
  • 2
  • 11

1 Answers1

0

Basically you would like to specify a quote parameter for your CSV data.

Redshift has 2 ways of specifying external tables (see Redshift Docs for reference):

  • using the default built-in SerDes and properties like ROW FORMAT DELIMITED, FIELDS TERMINATED BY
  • explicitly specifying a SerDe with ROW FORMAT SERDE, WITH SERDEPROPERTIES

I don't think it's possible to specify a quote parameter using the built-in SerDes.

It is possible to specify them using org.apache.hadoop.hive.serde2.OpenCSVSerde (look here for details on it's properties), but beware that there are know problems with it, as one described in this SO question.

Now for Metillion:

I have never used Matillion, but looking at their Redshift External Table documentation page, looks like it's only possible to specify the FORMAT and the FIELD TERMINATOR, but not to specify a SerDe and it's properties, hence it's not possible to specify the quote parameters for the external table - unless there are some undocumented means to specify a custom SerDe.

Personal note: We have experienced many problems with ingesting data stored as CSV, and we basically try to avoid it. There's no standard for CSV, each tool implements it's own version of support for it, and it's very difficult convince all your tools to see data the same way.

botchniaque
  • 4,698
  • 3
  • 35
  • 63