22

Basic question is - is it safe to store HTML in a database if I restrict who can submit to it?

I have a pretty simple question. I provide video tutorials and other content. Without spending months writing a proper BBCode parser, I would need to store the HTML so I can have it look exactly the way I want when I grab it from the database.

Basically I plan to store all information in the database about a tutorial series and each episode. I would like to have some formatting for the descriptions for both so I can add multiple paragraphs, ordered and unordered lists, links to required resources, and so on.

I am using PHP and creating my own database. I am using phpMyAdmin to store the information in the table right now. I will use a user with read only rights when I pull the information in the PHP code.

What is the best way to do this? Thank you!

Ethosik
  • 401
  • 1
  • 4
  • 9
  • The question is unclear. Certainly it is possible to store html (or whatever markup or language) inside a database. Also handling that with PHP is possible. All you have to make sure is that you escape the content so that a) your code is not open to sql injection and b) the statements are valid regardless of the contents value. Best is to use prepared statements for this, take a look at PDO. – arkascha Jan 30 '13 at 14:16
  • My apologies. The question basically is, it is safe to store raw html in the database if I only allow myself to be the only person with editing the information through phpMyAdmin. – Ethosik Jan 30 '13 at 14:22

5 Answers5

11

Like others have pointed out there's nothing dangerous about storing HTML in the DB. But when you display it you need to know the HTML is safe. Seeing as you're the only one editing the HTML I see no problem.

However, I wouldn't store HTML at all. If all you need are headings, paragraphs, lists, links, images etc I'd say Markdown is a perfect fit. The benefit with Markdown is that it looks just like normal text (ie you could send your articles as e-mails or save them as txt-documents), it takes up a lot less space than HTML and you don't have to change it once HTML gets updated.

http://michelf.ca/projects/php-markdown/

powerbuoy
  • 12,460
  • 7
  • 48
  • 78
  • If I am the only one able to edit and add to the database through phpMyAdmin, will I need to use something like markdown? If somebody has my information to where they can go into phpMyAdmin themselves and add unsafe code, I have other problems right? I will never add an admin section, I will pretty much always do it through phpMyAdmin with queries. And when I read the information, that user will only have read only. – Ethosik Jan 30 '13 at 14:41
  • That's true, but regardless I would still recommend using Markdown as it's much more like normal text and doesn't limit you to only HTML. It's easier to write, takes up less space in your DB, is totally safe and doesn't need to change when HTML gets updated. – powerbuoy Jan 30 '13 at 18:44
  • Well I mostly will just use tags like p, ul, ol, li, and a. I will not get too fancy with it because these are just short descriptions I will be dealing with. 90% of them will not have any HTML at all. Do I need to spend quite a bit of time to implement them with markdown? Does this eliminate the possibilty of somebody removing my markdown code and replacing it with HTML tags, or does HTML get parsed as well? I am not building a massive system here. At most I will probably have 300 entries in the entire database, not thousands like with a Forum or something with 10 or 20 with HTML. – Ethosik Jan 30 '13 at 19:13
  • The Markdown version that I use works like this: `echo Markdown($textFromDB)` and it comes out as (X)HTML. All you need to do to use it is `include 'Markdown.php'`. The Markdown syntax is super easy to learn as well and (as I mentioned before) looks more like regular text than a markup language. – powerbuoy Jan 30 '13 at 21:53
5

Storing HTML code is fine. But if it is not from trusted source, you need to check it and allow a secure subset of markup only. HTML Tidy library will help you with that.

Also, you need to count with a future change in website design, so do not use too much markup, only basic tags. To make it look like you want, use global CSS rules and semantically named classes in the markup.

But even better is to use Markdown or another wiki-like syntax. There are nice JS editors for Markdown with real-time preview (like the one here at Stackowerflow), and you can avoid HTML altogether.

Josef Kufner
  • 2,851
  • 22
  • 28
5

From the security point of view it is not less secure to store your HTML in a database than storing it anywhere else - if you are the only author of that HTML. But then again if other people can author HTML in your website then it doesn't matter where you store it - only how you sanitize it and how and where you display it.

Now whether or not it is an efficient way to store HTML is a completely different matter. If I were you I would use some decent templating system and store HTML in files.

Zed
  • 3,387
  • 19
  • 14
  • That is what I am doing. Instead of creating static HTML pages with the content, I am building a template system and the information gets pulled from the database. But the issue is, some tutorials and content will need to have required resources while others will not. So I will need to add HTML anchor tags in the description like Download this file. So I cannot really create a template system when some tutorials require things that others do not. I hope that makes sense. – Ethosik Jan 30 '13 at 14:24
3

My initial answer to "should I store html in a db" is generally no. Sure it's safe if you know what you're storing, but are you really considering best practices when you ask only that question? The true answer is "It depends".

I'm sure there are things like Wordpress that store html in a database, however, as a professional website designer, I like to remember the Separation of Concerns principle. How reusable is storing html in your database for a mobile app? Is your back end now in charge of display as well as data? Do you have many implementation possibilities for a front end or are you now stuck with whatever the back end portrays, what if you want it a different color and you've stacked ul within ul within ul? How easy is the css styling now? How easy is it to change or update that html?

I could be wrong, but even Sitecore and Kentico may store an html template in a database somewhere, but the data associated with that html template is a model, not directly on the html template.

So, when you are considering this question, you may want to store your models one place and your templates another, that way when you say "hey, lets build a mobile app" you can grab your data and go, rather than creating yet another table to store the same data.

Patrick Knott
  • 1,666
  • 15
  • 15
0

I made a really big mistake by storing text data in Mongodb gridFS + compression and using mongodump for daily backup. GridFS is 1GB of textfiles but after backup memory usage rises sometimes 1GB daily after one month 20GB in memory due to how this backup is made.

In mongodb you should do a snapshot of the data folder - rather than do mongodump. The possible reason is that it copies unused data from disk into memory then makes bson dump. So in my case text that was never used for a long time should never be loaded into memory. I think this is how backup works as even right now my Mongodb is using 200MB of ram after run mongodump its can rise to 3GB

So i think the best solution is to use a filesystem for storing HTML files as your even RAID like PERC H700 has many amazing caching features including read ahead. But it has some limitations like network access and with my experiences some data was corrupted in time and needed to run chkdsk for repair as many GB of data was add or removed daily. Also you should consider to use proper raid features like Write trough that prevent data loss when power failure.

Sqlite is not designed to be used with extremely big data so you shouldn't not use it and has missing many caching features.

Not perfect solution is to use MariaDB or its own caching script in nodejs that can use memcached/Linux ramdisk with maybe 1GB of hot cache. Using an internal nodejs caching mechanism after some time can produce many memory leak. So i can use it for network connection and I/O are using filesystem lock and many "HOT" most used files can be programmed to cached in RAM or just leave as is

user956584
  • 5,316
  • 3
  • 40
  • 50