0

In my program, I'm going to process some strings. These strings can be from any language.(eg. Japanese, Portuguese, Mandarin, English and etc.)

Sometime these strings may contain some HTML special characters like trademark symbol(), registered symbol(®), Copyright symbol(©) and etc.

Then I am going to generate an Excel sheet with these details. But when these is a special character, even though the excel file is created it can not be open since it is appeared to be corrupted.
So what I did is encode string before writing into excel. But what happened next is, all the strings except from English were encoded. The picture shows that asset description which is a Japanese language text is also converted into encoded text. But I wanted to encoded special characters only.
enter image description here

゜祌づ りゅ氧廩, 駤びょ菣 鏥こ埣槎で is converted to ゜祌づ りゅ氧廩, 駤びょ菣 鏥こ埣槎で But I wanted only to encoded special characters.

So what I need is to identify whether the string contains that kind of special character.Since I am dealing with multiple languages, is there any possible way to identify whether the string contain a HTML special characters?

Punuth
  • 417
  • 3
  • 6
  • 19
  • Why do you want to know if you have special character? How is that a problem? *"Since I am dealing with multiple languages"* - the *code* of those characters will stay the same disregards language (each language can add more *special* characters however, the question is what makes them special), so the question is quite vague. – Sinatr Aug 03 '16 at 10:01
  • 4
    Possible duplicate of [Check for special characters (/\*-+\_@&$#%) in a string?](http://stackoverflow.com/questions/4503542/check-for-special-characters-in-a-string) – Sinatr Aug 03 '16 at 10:09
  • Actually I am going to write these string into a MS Excel sheet. If the string contains any special character, the generated excel sheet appeared as corrupted. So what I did is encode string before writing the excel sheet. Then what happened is that all the text from other languages except from English is also encoded. So this is why I need to identify whether the string contains those special characters.. – Punuth Aug 03 '16 at 10:15
  • 1
    See [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). You could present the problem with excel (ask a new question, include code and explain what is the problem) instead of asking to fix attempted solution. – Sinatr Aug 03 '16 at 10:17
  • 1
    How are you creating that Excel sheet, add an [mcve]. – rene Aug 03 '16 at 10:39

3 Answers3

3

Try this using the Regex.IsMatch Method:

string str = "*!#©™®";
var regx = new Regex("[^a-zA-Z0-9_.]");
if (regx.IsMatch(str))
{
    Console.WriteLine("Special character(s) detected.");
}

See the Demo

Raktim Biswas
  • 4,011
  • 5
  • 27
  • 32
1

Try the Regex.Replace method:

// Replace letters and numbers with nothing then check if there are any characters left.
// The only characters will be something like $, @, ^, or $.
//
// [\p{L}\p{Nd}]+ checks for words/numbers in any language.
if (!string.IsNullOrWhiteSpace(Regex.Replace(input, @"([\p{L}\p{Nd}]+)", "")))
{
    // Do whatever with the string.
}

Detection demo.

  • Thanks a lot for your answer. And I wanna know that, is there a way to get all special character list rather than just hard coding them in the Regex? – Punuth Aug 03 '16 at 10:17
  • And done! Updated the answer and added a russian test in the demo. It should work for any languages now, hopefully. –  Aug 03 '16 at 10:47
  • Yep..It seems ok..But in Japanese ゜祌づ りゅ氧廩, 駤びょ菣 鏥こ埣槎で it drops very first character ' ゜ '. – Punuth Aug 03 '16 at 14:35
  • It thinks that any type of punctuation are *special characters*. I have a look see if it can target punctuation as well. –  Aug 04 '16 at 01:25
0

I suppose that you could start by treating your string as a Char array https://msdn.microsoft.com/en-us/library/system.char(v=vs.110).aspx Then you can examine each character in turn. Indeed on a second read of that manual page why not use this:

 string s = "Sometime these strings may contain some HTML special characters like trademark symbol(™), registered symbol(®), Copyright symbol(©) and etc.゜祌づ りゅ氧廩, 駤びょ菣 鏥こ埣槎で";
Char[] ca = s.ToCharArray();
foreach (Char c in ca){
    if (Char.IsSymbol(c))
        Console.WriteLine("found symbol:{0} ",c );
}
SlightlyKosumi
  • 701
  • 2
  • 8
  • 24