5

I'm trying to create an external table in Redshift from a csv that has quote escaped quotes in it, as documented in rfc4180:

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

For example: "aaa","b""bb","ccc"

I get no errors but the final table has a null value where my string should be.

Is there a way to tell Redshift to understand this csv format when creating an external table?

I do not want to change the formatting of the csv file.


Example csv:

"some ""text""",some more text,"more, text",and more

Example external table creation:

create external table spectrum.spectrum_test_quote(
  a varchar(32),
  b varchar(32),
  c varchar(32),
  d varchar(32)
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties (
  'separatorChar' = ',',
  'quoteChar' = '\"',
  'escapeChar' = '\\'
)
stored as textfile
location 's3://';

Expected results:

  • field a: some "text"
  • field b: some more text
  • field c: more, text
  • field d: and more

Actual result:

  • field a: null
  • field b: some more text
  • field c: more, text
  • field d: and more
Community
  • 1
  • 1
Tom Rea
  • 51
  • 1
  • 2
  • Can you check whether glue/athena works ok? can you check whether this works better with org.apache.hadoop.hive.serde2.RegexSerDe ? (you will need to set up a good 'input.regex' – Jon Scott Jan 30 '19 at 18:03
  • We were having similar problem with CSVs. Redshift was not showing the data, but Athena was. Luckily we were able to get same data as JSON, which we did. I'd be curious to know if you manege to solve the problem. – botchniaque Feb 01 '19 at 14:03
  • @botchniaque no solution so far. I'm able to get the data in AVRO rather than CSV so think I'm going to go down that rout. – Tom Rea Feb 02 '19 at 16:08

0 Answers0