0

I am preparing to my exams and I am stuck at RegEx validation. I would like to validate an entered web-site. I've surfed for a solution here, but have not found any which will fulfill my needs. For example these links should be validated:

and this should not:

For the moment the closest expression I got is:

http://(www\.)([^\.]+)(\.com)(/([^\.]+)(\.html|\.aspx))?

It can be a little bit dirty, since it is my first deal with regexes

But in regexTester it highlights/accepts (I am using regexpal):

What should be changed in my regex?

P.S. Sorry for such a long story, I am just a beginner.

Ank
  • 6,040
  • 22
  • 67
  • 100
asdewka
  • 29
  • 6
  • 2
    Can you formalize the rules of validation? Which urls should pass, which should not? – Sergio Tulentsev Dec 28 '11 at 22:53
  • emm.. i am not sure what do you mean by that, but the links which should be validated I wrode at the top. (mby I just did not got your question). – asdewka Dec 28 '11 at 22:55
  • So, you have only these six links to validate? Or there may be other links passed to your program? – Sergio Tulentsev Dec 28 '11 at 22:57
  • I need the links would be validated with everything before .com and if there are some /ttt/sss they could be closed with .html|.aspx or without them (just .com/some/word) – asdewka Dec 28 '11 at 22:57
  • 1
    Why is this tagged both `java` and `C#`? – Andrew Barber Dec 28 '11 at 22:58
  • 6) should fail because of the spaces, but why should 5) //www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx fail? – jac Dec 28 '11 at 23:00
  • at the moment i tried just these, because it is not a big and serious project so it could be enough for explanation, however of course it would be nice to have validated some links as http://www.forum.ru-board.com either with something like ebay.co.uk, but as I could not finish with those links I even have not started thinking about that – asdewka Dec 28 '11 at 23:00
  • Beaner, just in my RegEx I did not have something like .com.au :) – asdewka Dec 28 '11 at 23:02
  • Is this a duplicate of http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url ? – Dawood ibn Kareem Dec 28 '11 at 23:31
  • @asdewka You should indicate exactly what language you are using these Regexes in, so that this question can be tagged appropriately. Different languages will use different Regex engines, so details will differ. – Andrew Barber Dec 28 '11 at 23:39
  • Andrew Barber I can use both, it is up to me, however right now I study c# – asdewka Dec 28 '11 at 23:50
  • Problem solved, thanks to David Wallace. – asdewka Dec 29 '11 at 00:14
  • Then I recommend this question be closed as a duplicate. –  Dec 29 '11 at 07:18

1 Answers1

0

The only difference that I see is whether it has multiple top-level domains (like co.uk or com.au).

Therefore that is what i check for:

^.*www.[a-zA-Z]*.[a-zA-Z]{1,3}/([a-zA-Z].*|)

that actually just checks whether it has only a single TLD and optionally some more parts in the URL.

I do NOT validate whether it starts with HTTP:// as that is no actual requirement for an URL. I also do not check the document type (html or aspx) as that can be variable or even named as well.

Myrtle
  • 5,761
  • 33
  • 47