0

I would like to remove all special characters from my UTF8 text, but I can't find any matching regular expression.

My text like this:

ASDÉÁPŐÓÖŰ_->,.!"%=%!HMHF

I would like to remove only these chars: _->,.!"%=%!

I tried this regex:

result = Regex.Replace(text, @"([^a-zA-Z0-9_]|^\s)", "");

But it removes my uft8 chars also.

I don't want to remove the accented characters, but I want to remove all glyph.

tixovoxi
  • 171
  • 4
  • 11
  • Define "special chars" please. Uncode contains tens of thousands of characters that are categorised: stating which categories you want to keep would be a start (and "utf8 char" has no meaning, UTF-8 is merely an encoding of Unicode code points into an octet stream, it says nothing about character taxonomy). – Richard Jun 07 '16 at 10:44
  • `\P{L}` should match anything NOT a *letter*. – SamWhan Jun 07 '16 at 10:45
  • 1
    I dont think it's duplicate. I dont need to determine if it contains utf-8 or not. I want to remove all glyph and others from an utf8 string. I don't want to remove the accented characters... – tixovoxi Jun 07 '16 at 10:45

2 Answers2

1
Regex.Replace(text, @"([^\w]|_)", "")
filhit
  • 2,084
  • 1
  • 21
  • 34
0

you want only numbers and letters?

then this is your solution:

result = Regex.Replace(text, "[^0-9a-zA-Z]+", "");

you could also try to specify a range in the ASCII table if you want a custom way of things stay in your string:

result = Regex.Replace(text, "[^\x00-\x80]+", "");
kamp
  • 101
  • 7