What I want to do: Scape all the links from a page using Simple HTML DOM while taking care to get full links (i.e. from http://
all the way to the end of the address).
My Problem: I get links like /wiki/Cell_wall
instead of http://www.wikipedia.com/wiki/Cell_wall
.
More examples: If I scrape the URL: http://en.wikipedia.org/wiki/Leaf
, I get links like /wiki/Cataphyll
, and //en.wikipedia.org/
. Or if I'm scraping http://php.net/manual/en/function.strpos.php
, I get links like function.strripos.php
.
I've tried so many different techniques of building the actual full URL, but there are so many possible cases that I am completely at a loss as to how I can possibly cover all the bases.
However, I'm sure there are many people who've had this problem before - which is why I turn to you!
P.S I suppose this question could almost be reduced to just handling local href
s, but as mentioned above, I've come across //en.wikipedia.org/
which is not a full url and yet is not local.