1

The screenshot below is the special character encountered. It looks like full-width English, but I have compared it and it is not. Please help me answer the questions. What is this special character and how to turn it into a normal letter

string show at idea

String special = "ΡΕΝΑ"; String em = "PENA"; String normal = "PENA";

1 Answers1

2

Assuming that the characters that you have copy-and-pasted into your question are correct ...

The characters in special are characters in non-Latin alphabets that (typically) look like Latin characters.

  • 'Ρ' - is the greek uppercase rho character
  • 'E' - is the greek uppercase epsilon character
  • 'Ν' - is the greek uppercase nu character
  • 'Α' - is the greek uppercase alpha character

There is no correct conversion of these characters to Latin letters. (They are homoglyphs for the Latin letters.)

... how can these Greek capital letters be converted into normal English letters?

They can't. The characters are not equivalent. They mean something different. If you encounter those characters in a string, they should not be converted to Latin characters.

(But if you insist, here is a library that purports to do the job: https://github.com/codebox/homoglyph. Use at your own risk!)


The characters in em are Unicode full width characters. For example 'E' is U+FF25 which is described in the Unicode code charts as "FULLWIDTH LATIN CAPITAL LETTER E".

You can convert any full width characters in a Java string to regular characters using java.text.Normalizer using the NFKC form.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • The characters in `em` are indeed fullwidth characters, but I don't think OP is talking about them. I think OP is talking about the characters in the string `special`, which are Greek letters. The `em` string is just there for comparison. – Sweeper Aug 16 '21 at 07:18
  • I tried to compare the ASCII code of two characters String special = "ΡΕΝΑ"; char specialP = 'Ρ'; System.out.println("special ASCII:" + (int) specialP); String em = "PENA"; char emP = 'P'; System.out.println("em ASCII:" +(int) emP); ----print--- special ASCII:929 em ASCII:65328 – jiangxiaopeng Aug 16 '21 at 07:19
  • Thank you for your answer, but how can these Greek capital letters be converted into normal English letters? – jiangxiaopeng Aug 16 '21 at 07:40
  • Thank you very much for your answer. My approach is to remove these characters as special characters – jiangxiaopeng Aug 16 '21 at 10:37
  • 1
    @jiangxiaopeng Do not examine individual characters using `char` type. That type is obsolete, unable to represent even half of the characters defined by Unicode and supported by Java. Instead, learn to use Unicode code point integers. Call methods such as `String#codePoints`. – Basil Bourque Aug 16 '21 at 16:57