0

I am getting black diamond with question mark symbols in values when I get data from SQL. I know it's because of the encoding. What I am trying to do is replace those unknown characters with space. I have found some of the characters's unicode:

["\u0060", "\u2018", "\u2019", "\u201C", "\u201D", "\uFFFD", "\u00A0", "\u1680", "\u180e", "\u2000", "\u2009", "\u200a", "\u200b​", "\u202f", "\u205f​", "\u3000", "\u2003"]

But there is still some showing. Is there any list of those characters or unicodes or function to do this?

Ashwin
  • 52
  • 8
  • 2
    The issue may just be your viewer and not the data. The Viewer you are using need to support the encoding and the font. The black diamond doesn't necessarily mean there is anything wrong. – jdweng Jul 19 '20 at 21:34
  • How are you displaying the values? – Andrew Williamson Jul 19 '20 at 22:11
  • What font are you using. That black-diamond-with-a-question-mark character is the Unicode REPLACEMENT CHARACTER (U+FFFD, https://en.m.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character ). It generally means "this character can't be rendered in this font". There are *a lot* of Unicode characters (think Armenian, Tamil or Hangul), good luck filtering them all out – Flydog57 Jul 19 '20 at 23:31

1 Answers1

1

I think encoding problems only ocurr with characters higher than 127 in the ascii table. So you could convert to space any character whose ascii code is greater than 127. This could fix some false positives but maybe it is ok for you.

MundoPeter
  • 704
  • 6
  • 12