-1

I am trying to make, in a sense, cache pages so that I can display them later as they used to be (if they are changed or deleted). So I'm pulling a whole page's HTML (craigslist ads) into a database field.

I'm using file_get_contents for the ease and simplicity of what I need it for. There is more to it than this, but this is the basis of what I've done

$page = file_get_contents('http://annapolis.craigslist.org/hea/3652436359.html');
// $page = mysql_real_escape_string($page);
// $page = htmlspecialchars($page);
// $page = htmlentities($page);
mysql_query("INSERT INTO `page` (`html`) VALUES ('$page')");

I have tried every built-in PHP sanitation function I could find.

  • mysql_real_escape_string
  • htmlspecialchars
  • htmlentities

None of these will sanitize the page enough so that it can be entered into a MySQL database and MySQL throws a syntax error every time. I was told by someone to just base64 encode the HTML and enter it, but I would like to be able to search the HTML in the database so that wouldn't work for what I need.

I've tried a variety of different things, such as two functions (one inside the other), but I can't seem to get it to work right.

Any and all help would be greatly appreciated.

  • 1
    Why don't you store them in plain files instead of database? If you really want them in database you can use `json_encode`? But I would suggest plain files and store only the name in db. – Mihai Iorga Mar 04 '13 at 06:16
  • 2
    The first thing you should do is **stop using the MySQL extension**. Please read http://stackoverflow.com/questions/12859942/why-shouldnt-i-use-mysql-functions-in-php#answer-12860140 – Phil Mar 04 '13 at 06:18
  • I want to be able to use an SQL query to search the HTML for specific keywords. That is why I really would like to have it in the database. – Ricky Lipe Mar 04 '13 at 06:18
  • @Phil, although very true, it's painful for these new members to be bombarded with "DON'T USE THE MySQL EXTENTION!" Maybe, offer a VERY SIMPLE TUTORIAL on why it's wrong and stuff. as tinyurl.com link most commonly. – grepsedawk Mar 04 '13 at 06:19
  • 1
    What error do you receive exactly? Try using PDO or mysqli 's prepared statements. I've always been saving html pages in db without any issues (and htmlentities is usueless, btw) – Damien Pirsy Mar 04 '13 at 06:19
  • @RickyLipe That's a terrible reason to use an RDBMS. Why not something like [ElasticSearch](http://www.elasticsearch.org/) – Phil Mar 04 '13 at 06:19
  • @Pachonk The SO [question](http://stackoverflow.com/q/12859942/283366) and answers I've linked to provide plenty of information on why you shouldn't use it and the available alternatives – Phil Mar 04 '13 at 06:20
  • @RickyLipe Also, what is the syntax error? – Phil Mar 04 '13 at 06:25
  • @Phil, since I'm clearly not an expert (which you've made it so clear to point out), should I learn PDO or MySQLi? Which do you, personally, prefer? – Ricky Lipe Mar 04 '13 at 06:25
  • @RickyLipe I prefer PDO as it's a consistent interface for multiple DB servers and the parameter binding methods are much easier to understand – Phil Mar 04 '13 at 06:27
  • @Phil: Call me a complete retard if you must, but just from the first glance at the documentation, it's talking about a class. Is PDO a built-in class? – Ricky Lipe Mar 04 '13 at 06:31
  • @Phil, I apologize, it kinda blended in with your name, so I didn't see it. – grepsedawk Mar 04 '13 at 07:47
  • Just FYI, scraping craigslist is against their terms of use, and they've been known to sue people for similar activities. (Although long before it comes to that, the IP address doing the scraping will likely be automatically blocked.) If you're planning to build a public service based on this, you might want to rethink. – Nathan Stretch Mar 10 '13 at 00:38

1 Answers1

0

Try this, it'll work

$page = file_get_contents('http://annapolis.craigslist.org/hea/3652436359.html');

mysql_query("INSERT INTO `page` set `html`= '".magic($page)."'");

function magic($value)
{
    if( get_magic_quotes_gpc() )
       $value = stripslashes( $value );

    if( function_exists( "mysql_real_escape_string" ) )
       $value = mysql_real_escape_string( $value ); 
    else
        $value = addslashes( $value );

return $value;
}
Tapas Pal
  • 7,073
  • 8
  • 39
  • 86