2

I'm building a custom CMS system that will store articles.

What is the correct way to store the article in the db? Given that an article may consist of a single paragraph or any number of paragraphs and may contain images it doesn't seem practical to store each paragraph in a distinct field of the db.

I have looked at how Wordpress stores posts and it seems to lump the raw html in a single field `post_content'. Is this an accepted method? The disadvantage I see with this method is that for each article any html mark-up is also stored. If I only stored the raw text then the dynamic page which displsys the article would only contain the html mark-up just once. But then how to derentiate between paragrpahs and images?

Any thoughts?

Thank you.

user2056238
  • 83
  • 2
  • 10
  • Images could be uploaded to some folder on your server and then the `` tag included in your text blob. I think most places doing bigger text fields use some sort of text blob, which for mysql I think you would find here: http://stackoverflow.com/a/332805/1618257 – David Starkey Mar 26 '13 at 22:21

1 Answers1

3

The field of choice would be a BLOB type (for example TEXT).

What data to store is however a different matter, and really depends on your implementation. I've worked on similar systems and have in the past stored data as:

  • XML - custom XML tags that get translated and converted by the application layer into HTML. This method also means you can do additional querying of the data with XPATH queries.
  • WMD - storing the data as WMD means there aren't any HTML tags which can help guard against cross site scripting attacks (XSS)
  • HTML this is the easiest and quickest way of stroing data. Providing you sanitise the input this goes a long way help XSS and other data you don't want but then this adds more complexity that WMD can solve from the beggining.

If you're looking for some kind of analytic data from the images which are linked, I would suggest storing the image URI in another table and linking to a primary key ID of the URL. This means that you can check and not duplicate image URL's. Image URI could equally be an ID if you upload and store the images within your domain.

Steve
  • 3,673
  • 1
  • 19
  • 24
  • Thanks - the XML approach looks interesting. Using custom XML tags... would I simply use tags such as content and then write a script to extract the content from the tags? – user2056238 Mar 27 '13 at 00:35
  • @user2056238 yes, if you have tags you can extract the data and images as you see fit - you can even start to include attributes e.g. and you can perform direct queries using XPATH from the database itself. – Steve Mar 27 '13 at 11:30
  • and so then I would store the entire xml feed as single BLOB type right? Then I can grab it and use the XML reader class. I was actually using this today for another application but I can see how I can easily use it for my CMS too. – user2056238 Mar 27 '13 at 18:48
  • how would I store snippets of text that I might want to be bold for a paragraph for example: This is bold text this is regular – user2056238 Apr 01 '13 at 10:38
  • 1
    @user2056238 this really depends on your implementation. You could do normal textbold text or something like markdown (for which there are many parsers for all kinds of different languages) and do this is normal **this is bold** – Steve Apr 03 '13 at 18:47