I'm creating some kind of crawler/proxy at the moment. It can navigate a website and still remain on my website while browsing. But I thought about while loading the website, get all the links and data at the same time.
So the website contains many "< tr>"(without the space) which again contains a lot of other stuff.
Here is 1 example of many on the website:
<tr>
<td class="vertTh">
<center>
<a href="/s/browse/other.php">Other</a>
<br>
<a href="/s/browse/documents.php">Document</a>
</center>
</td>
<td>
<div class="Name">
<a href="/s/database/Document_Title_Info" class="Link">Document Title Info</a>
</div>
<a href="http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters" title="Source">
<img src="/static/img/icon-source.png" alt="Source">
</a>
<font class="Desc">Uploaded 03-24 14:02, Size 267.35 KB, ULed by <a class="Desc" href="/s/user/username/" title="Browse username">username</a></font>
</td>
<td align="right">67</td>
<td align="right">9</td>
</tr>
Users browse the proxy site, and while they do, it catches info from the original website. I figured out how to get a string between two words, but I don't know how to make this to a "foreach" code or something else.
So let's say I want to get the source link. Then I would do something like this:
$url = $_GET['url'];
$str = file_get_contents('https://database.com/' . $url);
$source = 'http://example.com/source/to/' . getStringBetween($str,'example.com/source/to/','" title="Source">'); // Output looking like this: http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters
function getStringBetween($str,$from,$to)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
return substr($sub,0,strpos($sub,$to));
}
But I can't just do this, because there are multiple of these strings. So I'm wondering if there is any kind of way I can get Source, name and size on all of these strings?