0

I have an application where in I have stored a lot of websites without validating them. Now I am validating the URL entered. But the already stored URL's are there as it is.

I want a strict display code that allows me to correct the user typos also and just gives the a proper URL to deal with.

The data that is already in the system has a lot of typos such as ...http://example.com or htp://example.com or ttp://example.com. I want the code to tackle that and come up with the proper url either by regexing the invalid part or making it correct.

That is the best approach to establish this?

user3328402
  • 47
  • 2
  • 3
  • 12
  • Are you just trying to correct the scheme part? If not, how do you, as a human being, determine the difference between valid and invalid URLs? If you can't describe that process, it's unlikely a computer can fix the problem. – Damien_The_Unbeliever Jun 11 '14 at 07:46
  • check this answer http://stackoverflow.com/questions/4835269/how-to-check-that-a-uri-string-is-valid – Yuliam Chandra Jun 11 '14 at 07:51

2 Answers2

0

You can obviously pick out the correct ones with a regex.

However, you will need to write your own logic to fix those that are 'broken'. You could pull these and with another regex and then simply search and replace the broken element. There are going to be limitations to this as you can only really check the protocol prefix and not the domain part itself.

ChrisBint
  • 12,773
  • 6
  • 40
  • 62
0

Here is my try:

http(s)?://(www.)?[a-zA-Z0-9\-\.\\/]+

where [a-zA-Z0-9-.\/] includes all characters that you want to allow users to use.

P.S. please be aware that if you are using RegEx under C#, do not forget to use double \\ as otherwise your expression might not work properly.

Hope it gets you started.

Robert J.
  • 2,631
  • 8
  • 32
  • 59