1

I have a huge list of blog URLs that I need to check the validity of. I've knocked together a script from this answer and from here.

Here is my script:

$siteURL = 'http://example.com/'
$File    = '.\urls.txt'

$NewContent = Get-Content -Path $File | ForEach-Object {
    $_

    $HTTP_Request  = [System.Net.WebRequest]::Create($siteURL + $_)
    $HTTP_Response = $HTTP_Request.GetResponse()
    $HTTP_Status   = [int]$HTTP_Response.StatusCode

    if ($HTTP_Status -eq 200) {
       " - 200"
    } else {
        " - " + $HTTP_Status
    }

    $HTTP_Response.Close()
}

$NewContent | Out-File -FilePath $File -Encoding Default -Force

My issue is that when it gets to a 404 error it doesn't add this to the file and returns the following error in the console:

Exception calling "GetResponse" with "0" argument(s): "The remote server
returned an error: (404) Not Found."
At C:\Users\user.name\urlcheck.ps1:19 char:9
+         $HTTP_Response = $HTTP_Request.GetResponse()
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : WebException

Why am I getting this error?

Bonus question: my "200 - OK" responses are getting added to a new line, why?

Burgi
  • 421
  • 8
  • 24

2 Answers2

2

In order to handle a 404 response (and similar error responses), we need a bit of error handling code:

ForEach-Object {
    $_

    $HTTP_Request = [System.Net.WebRequest]::Create($siteURL + $_)

    try {
        $HTTP_Response = $HTTP_Request.GetResponse()
    }
    catch [System.Net.WebException] {
        # HTTP error, grab response from exception
        $HTTP_Response = $_.Exception.Response
    }
    catch {
        # Something else went horribly wrong, maybe abort?
    }

    $HTTP_Status = [int]$HTTP_Response.StatusCode

    If ($HTTP_Status -eq 200) {
       " - 200"
    }
    Else {
        " - " + $HTTP_Status
    }

    $HTTP_Response.Close()
}

Bonus question: my 200 -OK responses are getting added to a new line, why?

That's because you output $_ and " - " + ... in two separate statements. Remove the $_ from the top and combine it all in a single string:

ForEach-Object {
    $HTTP_Request = [System.Net.WebRequest]::Create($siteURL + $_)

    try {
        $HTTP_Response = $HTTP_Request.GetResponse()
    }
    catch [System.Net.WebException] {
        # HTTP error, grab response from exception
        $HTTP_Response = $_.Exception.Response
    }
    catch {
        # Something else went horribly wrong, maybe abort?
    }
    finally {
        # Grab status code and dispose of response stream
        $HTTP_Status = [int]$HTTP_Response.StatusCode
        $HTTP_Response.Dispose()
    }

    "$_ - $HTTP_Status"
}
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
1

The .NET implementation is a bit poorly designed at this point. WebRequest unfortunately throws on error status codes.

Based on this answer, you can use the following workaround:

$siteURL = 'http://example.com/'
$file    = '.\urls.txt'

(Get-Content $file) | foreach {
    $HTTP_Response = $null
    try {
        $HTTP_Request = [System.Net.WebRequest]::Create($siteURL + $_)
        $HTTP_Response = $HTTP_Request.GetResponse()
    }
    catch [System.Net.WebException] {
        # catch this specific exception and get the response from it
        $HTTP_Response = $_.Exception.Response
    }
    catch {
        # for other errors, output the error message:
        "{0} - ERROR: {1}" -f $_, $_.Exception.Message
        continue
    }
    finally {
        # standard handling of IDisposable
        if ($HTTP_Response) { $HTTP_Response.Dispose() }
    }
    $HTTP_Status = $HTTP_Response.StatusCode 

    # NOTE: This will also fix your "newline" problem
    "{0} - {1} ({2})" -f $_, [int]$HTTP_Status, $HTTP_Status
} | Out-File $file -Force
marsze
  • 15,079
  • 5
  • 45
  • 61