0

I have a string that when displayed in Notepad++ reads:

App.xEFxBFxBF35

The tocken EFxBFxBF seams to be some UTF entity that some applications cant handle (for me its redshift).

In Notepad the string reads as

App.35

How can I remove this entity from a string in c#?

EDIT

In visual studio the string shows in the debugger as

"App.\uffff35"

EDIT 1

At the end it turned out that the column needed to have its size doubled when I inserted non latin characters.

I created the redshift table by looking at the character length of the columns in sql server and directly used that number for the columns in redshift. That was for languages with latin characters, but not with non latin characters.

I found the different length by this redshift query.

select  bit_length('M');
select  bit_length('Б');
select  bit_length('Ö');

Gives back 8,16,16

Mathias F
  • 15,906
  • 22
  • 89
  • 159
  • http://www.fileformat.info/info/unicode/char/ef/index.htm and http://www.fileformat.info/info/unicode/char/bf/index.htm may be of interest. – mjwills Jul 18 '17 at 13:18
  • What data type are you using in Redshift? You should be using `VARCHAR` not `CHAR` - see http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html . – mjwills Jul 18 '17 at 13:18
  • I am using this version of redhshift and have varchar on it PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.1385 – Mathias F Jul 18 '17 at 13:24

1 Answers1

1

You have two broad options:

mjwills
  • 23,389
  • 6
  • 40
  • 63