2

I am using RegularExpressionValidator control with

[http(s)?://]*([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?

regular expression to validate Url. I need to allow german characters

(ä,Ä,É,é,ö,Ö,ü,Ü,ß)

in Url. What should be exact regular expression to allow these characters?

stema
  • 90,351
  • 20
  • 107
  • 135
Sampath
  • 21
  • 2
  • Being native German, I never heard of a German character `É` or `é`. Actually, I heard of them in the French class in school. – Uwe Keim May 07 '11 at 21:54
  • @Uwe Keim Thank you for point this out. Do you know, which german characters are allowed in domain names? – Sampath May 08 '11 at 03:36
  • If it is an "[Umlaut Domain](http://en.wikipedia.org/wiki/Internationalized_domain_name)" there should be all of those you named allowed (except those `É` and `é`). In addition there is also a rather new character which is the upper-case version of `ß`. (See [this German Wikipedia section](http://de.wikipedia.org/wiki/%C3%9F#Gro.C3.9Fschreibweise_mit_Versal-Eszett) about the upper-case character) – Uwe Keim May 08 '11 at 06:49

1 Answers1

1

I hope you are aware that it is not easy to use regex for URL validation, because there are many valid variations of URLs. See for example this question.

First your regex has several flaws (this is only after a quick check, maybe not complete)

See here for online check on Regexr

It does not match

http://RegExr.com?2rjl6]

Why do you allow only \w and - after the first dot?

but it does match

hhhhhhppth??????ht://stackoverflow.com

You define a character group at the beginning [http(s)?://] what means match any of the characters inside (You probaly want (?:http(s)?://) and ? after wards instead of *.

To answer your question:

Create a character group with those letters and put it where you want to allow it.

[äÄÉéöÖüÜß]

Use it like this

(?:https?://)?([äÄÉéöÖüÜß\w-]+\.)+[äÄÉéöÖüÜß\w-]+(/[-äÄÉéöÖüÜß\w ./?%&=]*)?

Other hints

The - inside of a character group has to be at the start or the end or needs to be escaped.

(s)? is s?

Community
  • 1
  • 1
stema
  • 90,351
  • 20
  • 107
  • 135
  • Excellent. My original issue is resolved. I changed regex to (?:http(s)?://)?([äÄÉéöÖüÜß\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? and it is now allowing german characters and also does not allow hhhhhhppth??????ht://stackoverflow.com. But, you have pointed out a very important thing that it does not match http://RegExr.com?2rjl6] (query string) which is very normal. What changes I need to make in my above regex to allow querystring? – Sampath May 08 '11 at 03:44
  • As I said in my first sentence, its not trivial and has already been discussed a several times here (I provided there also a link with a solution). A quick answer is: add the `?` to the character class, but I am sure then there are other valid URLs that will not be matched, maybe a better link is this [stackoverflow.com/questions/206059/php-validation-regex-for-url](http://stackoverflow.com/questions/206059/php-validation-regex-for-url). Look at the solutions there and add your additional characters. – stema May 08 '11 at 06:48