How to replace a non English character with English character

Question

I have got a weird problem. I'm getting text from Google cloud vision containing non English characters but they are actually English characters. It is a mistake from Google cloud vision OCR.

I'm getting a character like this : Héllo

Notice that é is non English character.

I want to convert into simple "Hello" so I can process this word.

I'm not looking for the programming answer. I'm just looking for ways to do this.

Any hint would be useful.

Thanks!

I have no clue how to perform this task. That's why I had to post it here. I have got text from Google cloud vision. I don't know how am I supposed to do this. Just finding out ways to do this. Hope you understand. :) — Abhishek Deshkars, Jun 10 '20 at 05:50

score 0 · Accepted Answer · answered Jun 10 '20 at 05:50

0

If Apache Commons is an option for you, you could make use of their StringUtils library. The stripAccents method should suit your needs. From the source code you can see that it actually makes use of java.text.Normalizer, so you could also look into that.

answered Jun 10 '20 at 05:50

vox

420
2
4

Thanks a lot! I didn't know word "strip accents" I found the answer after searching from this word. Here is the answer for python: https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string – Abhishek Deshkars Jun 10 '20 at 05:54

How to replace a non English character with English character

1 Answers1