My approach: Use a RegEx to extract the domain name. Then add http:
to the front and /
to the end. Here's the RegEx:
^(?:http:\/\/)?([\w_]+(?:\.[\w_]+)+)(?=(?:\/|$))
Also see this answer to the question Extract root domain name from string. (It left me somewhat disatisfied, although pointed out the need to account for https
, the port number, and user authentication info which my RegEx does not do.)
Here is an implementation in VBScript. I put the RegEx in a constant and defined a function named GetDomainName()
. You should be able to incorporate that function in your ASP page like this:
normalizedUrl = "http://" & GetDomainName(url) & "/"
You can also test my script from the command prompt by saving the code to a file named test.vbs
and then passing it to cscript
:
cscript test.vbs
Test Program
Option Explicit
Const REGEXPR = "^(?:http:\/\/)?([\w_]+(?:\.[\w_]+)+)(?=(?:\/|$))"
' ^^^^^^^^^ ^^^^^^ ^^^^^^^^^^ ^^^^
' A B1 B2 C
'
' A - An optional 'http://' scheme
' B1 - Followed by one or more alpha-numeric characters
' B2 - Followed optionally by one or more occurences of a string
' that begins with a period that is followed by
' one or more alphanumeric characters, and
' C - Terminated by a slash or nothing.
Function GetDomainName(sUrl)
Dim oRegex, oMatch, oMatches, oSubMatch
Set oRegex = New RegExp
oRegex.Pattern = REGEXPR
oRegex.IgnoreCase = True
oRegex.Global = False
Set oMatches = oRegex.Execute(sUrl)
If oMatches.Count > 0 Then
GetDomainName = oMatches(0).SubMatches(0)
Else
GetDomainName = ""
End If
End Function
Dim Data : Data = _
Array( _
"xhttp://www.test.com" _
, "http://www..test.com" _
, "http://www.test.com." _
, "http://www.test.com" _
, "www.test.co.uk/" _
, "www.test.co.uk/?q=42" _
, "www.test.info/test-page.html" _
, "www.test.gov/test-folder/test-page.html" _
, ".www.test.co.uk/" _
)
Dim sUrl, sDomainName
For Each sUrl In Data
sDomainName = GetDomainName(sUrl)
If sDomainName = "" Then
WScript.Echo "[ ] [" & sUrl & "]"
Else
WScript.Echo "[*] [" & sUrl & "] => [" & sDomainName & "]"
End If
Next
Expected Output:
[ ] [xhttp://www.test.com]
[ ] [http://www..test.com]
[ ] [http://www.test.com.]
[*] [http://www.test.com] => [www.test.com]
[*] [www.test.co.uk/] => [www.test.co.uk]
[*] [www.test.co.uk/?q=42] => [www.test.co.uk]
[*] [www.test.info/test-page.html] => [www.test.info]
[*] [www.test.gov/test-folder/test-page.html] => [www.test.gov]
[ ] [.www.test.co.uk/]