In my program, I'm going to process some strings. These strings can be from any language.(eg. Japanese, Portuguese, Mandarin, English and etc.)
Sometime these strings may contain some HTML special characters like trademark symbol(™
), registered symbol(®
), Copyright symbol(©
) and etc.
Then I am going to generate an Excel sheet with these details. But when these is a special character, even though the excel file is created it can not be open since it is appeared to be corrupted.
So what I did is encode string before writing into excel. But what happened next is, all the strings except from English were encoded. The picture shows that asset description which is a Japanese language text is also converted into encoded text. But I wanted to encoded special characters only.
゜祌づ りゅ氧廩, 駤びょ菣 鏥こ埣槎で is converted to ゜祌づ りゅ氧廩, 駤びょ菣 鏥こ埣槎で
But I wanted only to encoded special characters.
So what I need is to identify whether the string contains that kind of special character.Since I am dealing with multiple languages, is there any possible way to identify whether the string contain a HTML special characters?