0

I have the following regex:

foo = Regex.Replace(foo, @"[^a-zA-Z0-9\s-]", " ");

Currently, this removes Unicode characters. What regex can I use remove all non-URL friendly characters (i.e. : , < etc.), but allow Unicode and accented characters?

Thanks, Mark

Mark Richman
  • 28,948
  • 25
  • 99
  • 159
  • Could you give us a complete list of what characters you consider "non-URL friendly"? Why should f.i. "less than" be unfriendly? – Hyperboreus May 23 '11 at 21:51
  • I dont know if this helps but could you use the htmlencode? http://msdn.microsoft.com/en-us/library/w3te6wfz.aspx. This should make all the necessary replacements for you. – Matthew Sanford May 23 '11 at 21:55
  • Sorry there is also an equivelant for URL encoding... http://msdn.microsoft.com/en-us/library/zttxte6w.aspx – Matthew Sanford May 23 '11 at 21:59
  • I don't actually want to HTML Encode it because it's being used to construct yet another URL. The variable 'foo' is actually a search term, so I just need to explicitly yank anything that's not URL-friendly. – Mark Richman May 23 '11 at 22:01
  • Can you clarify what you mean by "Unicode character"? – Nathan Ryan May 23 '11 at 22:19
  • Technically, even the characters that you want to remove are Unicode characters (in a loose sense of the word "character"). Do you mean everything that is not ASCII? – Nathan Ryan May 23 '11 at 23:02
  • I want to remove from the input string everything that is not "safe" user input in any language. Giving this problem another look, I may want to just use Microsoft.Security.Application.AntiXss.GetSafeHtml(). – Mark Richman May 24 '11 at 00:47
  • answer is here http://stackoverflow.com/questions/123336/how-can-you-strip-non-ascii-characters-from-a-string-in-c – suraj jain Dec 15 '11 at 13:02

2 Answers2

3

How about instead of using a negated class, you simply have a replacement list of the characters you dont want?

s/[:,<]*//g
Razor Storm
  • 12,167
  • 20
  • 88
  • 148
  • I was looking for something a little less specific than those three characters lol – Mark Richman May 23 '11 at 21:58
  • 4
    Is that really justification to downvote the question? I'm sorry this isn't what you wanted, but that wasn't specified in the question either. A simple comment would have sufficed. – Razor Storm May 23 '11 at 22:48
0

Microsoft.Security.Application.AntiXss.GetSafeHtml() solved my problem.

Mark Richman
  • 28,948
  • 25
  • 99
  • 159