Could someone help us with a regular expression in order to detect repeated patterns inside a URL string? The goal is obviously to detect malformed weird URLs.
For example, the following URLs are alright:
http://www.somewhere.com/help/content/21/23/en/
http://www.somewhere.com/help/content/21/24/en/
http://www.somewhere.com/help/content/21/64/en/
http://www.somewhere.com/help/content/21/65/en/
http://www.somewhere.com/help/content/21/67/en/
While this this ones, are incorrect, and should be tagged:
http://www.somewhere.com/help/content/21/content/1/54/en/
http://www.somewhere.com/help/content/21/content/1/62/en/
http://www.somewhere.com/help/content/21/content/8/52/en/
Since content is repeated twice. So far we have been solving this using parse_url and explode, but it looks quite inefficient!
As well, I'm aware that there might be many URLs that repeat a number in the path, or some other value, so any suggestions to solve this issue would be more than welcome.
Thanks a lot!
For a better comprehension of the issue, you can visit the following link and click on "Administrador MySQL":