I'm using a script, written in PHP and Jquery, that allows to scrape a static website:
<?php
if(isset($_GET['site'])){
$f = fopen($_GET['site'], 'r');
$html = '';
while (!feof($f)) {
$html .= fread($f, 24000);
}
fclose($f);
echo $html;
}
?>
The Jquery part:
$(function(){
var site = $(input).val();
$.get('proxy.php', { site:site }, function(data){
$('#myDiv').append(data);
}, 'html');
});
As you can see the website that needs to be scraped has to be value in input. I want to give my visitors the ability to set there own website to be scraped.
The problem is that I cant figure out how to secure the PHP part. As I understand the input value is a big security risk because anything can be sent with value. I already experienced slow performance and several 'pc crashes' working with this code. Im not sure if the crashes are related but they only happen when I work on the code. Anyway I would really like to know how to validate the value(from input) sent to my server, only REAL urls should be aloud. I googled for days but I cant figure it out (new at PHP)
ps If you spot any other security risks please let me know..