11

We have a file in S3 that is loaded in to Redshift via the COPY command. The import is failing because a VARCHAR(20) value contains an Ä which is being translated into .. during the copy command and is now too long for the 20 characters.

I have verified that the data is correct in S3, but the COPY command does not understand the UTF-8 characters during import. Has anyone found a solution for this?

Elliot Chance
  • 5,526
  • 10
  • 49
  • 80

5 Answers5

16

tl;dr

the byte length for your varchar column just needs to be larger.

Detail

Multi-byte characters (UTF-8) are supported in the varchar data type, however the length that is provided is in bytes, NOT characters.

AWS documentation for Multibyte Character Load Errors states the following:

VARCHAR columns accept multibyte UTF-8 characters, to a maximum of four bytes.

Therefore if you want the character Ä to be allowed, then you need to allow 2 bytes for this character, instead of 1 byte.

AWS documentation for VARCHAR or CHARACTER VARYING states the following:

... so a VARCHAR(120) column consists of a maximum of 120 single-byte characters, 60 two-byte characters, 40 three-byte characters, or 30 four-byte characters.

For a list of UTF-8 characters and their byte lengths, this is a good reference: Complete Character List for UTF-8

Detailed information for the Unicode Character 'LATIN CAPITAL LETTER A WITH DIAERESIS' (U+00C4) can be found here.

Community
  • 1
  • 1
Adrian Torrie
  • 2,795
  • 3
  • 40
  • 61
1

Please check below link

http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html

You should use ACCEPTINVCHARS in you copy command. Details here

http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html#acceptinvchars

Sandesh Deshmane
  • 2,247
  • 1
  • 22
  • 25
0

I have a similar experience which some only characters like Ä were not copied correctly when loading mysqldump data into our Redshift cluster. It was because the encoding of mysqldump was latin1 which is the default character set of mysql. It's better to check the character encoding of the files for COPY first. If the encoding of your files are not UTF-8, you have to encode your files.

Masashi M
  • 2,679
  • 21
  • 22
0

Using "ACCEPTINVCHARS ESCAPE" in the copy command solved the issue for us with minor data alteration.

Sailendra Pinupolu
  • 1,038
  • 1
  • 10
  • 8
0

You need to increase the size of your varchar column. Check the stl_load_errors table, see what is the actual field value length for the failed rows and accordingly increase the size. EDIT: Just realized this is a very old post, anyways if someone need it..

SwapSays
  • 407
  • 7
  • 18