After 30.000 to 40.000 tests I noticed that you really encounter lots of different situations which have to be worked against.
The starting point is ofcourse somewhere to only look at the rel tag in there and fetch this, but along the way you will find more and more situations you will have to cover.
In case anyone will look at this thread and tries to come closer to 100% perfection I uploaded my (PHP) code here: https://plugins.svn.wordpress.org/wp-favicons/trunk/includes/server/class-http.php. This is part of a (GPL) WordPress Plugin that retrieves Favicons, more or less on request back then, out of limitations of the standard Google one (as mentioned above). The code finds a substantially amount more icons that the code of Google. But also includes google and others as image providers to shortcut further iterations on trying to retrieve the icon.
When you read through the code you will probably see some situations that you will encounter e.g. base64 data uris, pages redirecting to 404 pages or redirecting a gazillion times, retrieving weird HTTP status codes and having to check every possible HTTP return code for validness, the icons themselves that have a wrong mime type, client side refresh tags, icons in the root folder and none in the html code, etc... etc... etc...
If you go up a directory you will find other classes that then are ment to store the actual icons against their url (and ofcourse you will then need to find out which "branches" use the same favicon and which not, and find out if they belong to the same "owner" or are really different parts but under the same domain.