I want a regular expression for VB.NET to remove all hyperlinks in a string, including protocols https and http, full document name, subdomains, querystring parameters, so all links like:
- http://www.example.com
- http://www.example.com/
- https://www.example.com
- http://www.example.com/page.html?t=7
- http://example.com?q=test&sort=1
- www.example.com
- etc
Here's the string I'm working with in which all links need to be removed:
Dim description As String
description = "Deep purples blanket / wrap. It is gorgeous" & _
"in newborn photography. " & _
"layer" & _
"beneath the baby.....the possibilities are endless!" & _
"You will get this prop! " & _
"Gorgeous images using Lavender as a basket filler " & _
"Photo by Benbrook, TX" & _
"Imaging, Ontario" & _
"http://www.photo.com?t=3" & _
" www.photo.com" & _
" http://photo.com" & _
" https://photo.com" & _
" http://www.photo.nl?t=1&url=5" & _
"Photography Cameron, NC" & _
"Thank you so much ladies!!" & _
"The flower halos has beautiful items!" & _
"http://www.enchanting.etsy.com" & _
"LIKE me on FACEBOOK for coupon codes, and to see my full product line!" & _
"http://www.facebook.com/byme"
What I have now:
description = Regex.Replace(description, _
"((http|https|ftp)\://[a-zA-Z0-9\-\.]+(\.[a-zA-Z]{2,3})?(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*)", "")
It replaces most links, but not links without protocol, like www.example.com
How I alter my expression to include these links?