5

I want to store articles in a database, but I cannot seem to find much information on the best way to do this, from what I have read it seems split between most people on how to effectively do this. A lot of people will suggest a way and others will point out sql injection issues, and I cannot seem to find much about this topic that is fairly new.

Here is the html of an article:

    <div id="main">

        <article>

            <header>
                <h3> Title </h3>
                <time pubdate="pubdate"> 2011-07-22 </time>
            </header>

            <p> Article Text </p>

        </article>

    </div>

Ideally I guess it would be best to store the chunk of html making up each article into a database but there seems to be a lot of problems with this, and like I said I can't find many posts over this particular topic, and as someone new to php and databases I want to get some input on the best way to go about this before I proceed.

valon
  • 467
  • 1
  • 7
  • 16
  • I don't understand why people are saying SQL Injection problems. I thought with param binding, those kinds of problems are mostly a thing of the past. Am I wrong? – vol7ron Jul 23 '11 at 02:52

6 Answers6

2

When ever I store a large amount of user text, I just base64 it, then before you display it, make sure to run it through htmlspecialchars, this will keep html from working, so htmlspecialchars(base64_decode($content)) would work fine for displaying.
If you are using bbcode for formatting, then make sure to run htmlspecialchars before you start formatting your bbcode.

This isn't the only way, you can sanitize inputs without base64'ng it, but I see no reason not to, especially when nobody needs to see directly into the database.

Jess
  • 8,628
  • 6
  • 49
  • 67
1

the safest way to prevent sql injection here is to use prepared statement.

$stmt = $con->prepare("INSERT INTO Articles (Title, Date, Article) VALUES (?, ?, ?)");
$stmt->bind_param("sss", $title, $currentDate, $articleBody);

The question marks represent the values you will pass. "sss" is saying that each of the 3 variables will be a string and then you can call this prepared statement and pass it the correct values.

$title = $_POST[title];
$currentDate = date("Y-m-d H:i:s");
$articleBody = $_POST[article];
$stmt->execute();

this will make sure that no malicious sql can be injected into your database.

hope this helps!

Danny Blue
  • 463
  • 4
  • 16
1

Storing it in a SQL db is fine, but you can and you must protect against SQL injection in your code.

ie, cleaning all user input before sending it to the db.

PHP Manual on SQL injection

Tom Studee
  • 10,316
  • 4
  • 38
  • 42
1

I think the best method is to just store pure text, but usually that is not the case when you want to use extra formatting. You can convert the html tags to bbcodes or similar tags which can prevent sql injection however if you escape the html content it would be as safe as any other content. so do mysql_real_escape_string on whatever data you put into the database and you would be fine.

However, the best practice would be to store the html code along with the article text as a html file which you can serve when the user requests the data but in the database you can just store purely text for indexing and search purposes. This is ideal as you would not need the html content for searching anyways and it will also prevent sql attacks if the content is purely text that is to be stored in the database. But as the user requests the file get the content of the html file for that article which contains the formatted text and serve that.

Vish
  • 4,508
  • 10
  • 42
  • 74
1

use lucene or sphinx, either from Zend_Lucene or through solr. they will make the indexing for the article faster, and you can also do a full text search on them too. using lucene or solar to index and search in these cases is pretty much a standard procedure, and will let you scale to millions of articles.

sphinx is a daemon that runs "in parallel" to the mysql daemon. for using sphinx, you can use the pecl sphinx extension.

if you want to go with lucene, you can try zend_lucene or solr, which is actually a tomcat distro with an webapp that exposes lucene as a web service, so you can access it in a standard way, independantly of the language.

choosing either of them is ok. you can index by full text (content), and categories, or whatever you need to index by.

marcelog
  • 7,062
  • 1
  • 33
  • 46
0

Store your article as TEXT :) Just pass it through this php function first to prevent injection attacks:

// Prevent MySQL Injection Attacks
function cleanQuery($string){
    if(get_magic_quotes_gpc())  // prevents duplicate backslashes
        $string = stripslashes($string);
    return mysql_escape_string($string);
}
Paul
  • 139,544
  • 27
  • 275
  • 264
  • 1
    My advice would be to use PDO => http://net.tutsplus.com/tutorials/php/why-you-should-be-using-phps-pdo-for-database-access/. This is safer, faster and very nice if you want to do testing of your database. To do testing just use SQLite in memory mode => new PDO('sqlite::memory:') – Alfred Jul 23 '11 at 08:55