2

I am developing a Rails application that will access a lot of RSS feeds or crawl sites for data (mostly news). It will be something like Google News but with a different approach, so I'll store a lot of news (or news summaries), classify them in different categories and use ranking and recommendation techniques.

  • Should I go with MySQL?

  • Is it worthwhile using IBM DB2 purexml to store the doucuments? Also Ruby search implementations (Ferret, Ultrasphinx and others) are not needed If I choose DB2. Is that correct?

  • What are the advantages of PostreSQL in this?

  • Does it makes sense to use Couch DB in this scenario?

I'd like to choose the best option but without over-complicating the solution. So I discarded the idea to use two different storage solutions (one for the news documents and other for the rest of the data). I'm also considering only "free" options, so I didn't look at Oracle or MS SQL Server.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
agaelebe
  • 376
  • 4
  • 6

5 Answers5

3

purexml is heavier than SQL, so you pay more for your roundtrip between webserver and DB. If you plan to have lots of users, I'd avoid it, your better off letting your webserver cache the requests, thus avoiding creating xml(rss) everytime, if that is what you are thinking about.

I'd go with MySQL because its really good at serving and its totally free, well PostgreSQL is too, but haven't used it so I can't say.

CouchDB could make sense, but not if you plan on doing OLAP (Offline Analysis) of your data, a normal RDBMS will be better at it.

Robert Gould
  • 68,773
  • 61
  • 187
  • 272
  • Last time I checked, OLAP is an acronym for Online Analytical Processing – Noah Goodrich Nov 19 '08 at 15:23
  • I got the acronym wrong for sure, but what it practically means is you Analyze your data Offline(as in warehousing, not live servers), that's what I meant, not to O.L.A.P.==Offline Analysis. But thats for the comment and explaining the Acronym – Robert Gould Nov 19 '08 at 15:28
  • But it doesn't mean that you have to analyze data offline. In fact, there are any number of applications that perform both OLTP and OLAP functions and thus need databases or at least tables within the same database that are optimized for both uses. – Noah Goodrich Nov 19 '08 at 15:52
3

Admitting firstly that I generally don't like mysql, I will say that there has been writing on this topic regarding postgres:

http://oldmoe.blogspot.com/2008/08/101-reasons-why-postgresql-is-better.html

This is always my choice when I need a pure relational database. I don't know whether a document database would be more appropriate for your application without knowing more about it. It does sound like it's something you should at least investigate.

Dustin
  • 89,080
  • 21
  • 111
  • 133
1

MySQL is probably one of the best options out there; light, easy to install and maintain, multiplatform and free. On top of that there are some good free client tools.

Something to think about; because of the nature of your system you will probably have some tables that will grow quite a lot very quickly so you might want to think about performance.

Thus, MySQL supports vertical partitioning but only from V 5.1.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Jacobo
  • 11
  • 1
0

It sounds to me the application you will build can easily become a large-scale web app. I would suggest PostgreSQL, for it has been known for its reliability.

You can check out the following link -- Bob Ippolito from MochiMedia tells us why they ditched MySQL for PostgreSQL. Although the posts are more than 3 years old, the issues MySQL 5.1 has recently tend to prove that they are still relevant.

http://bob.pythonmac.org/archives/category/sql/mysql/

Cygwin98
  • 510
  • 5
  • 13
0

MySQL is good in production. I haven't used PostgreSQL for rails, but it's a good solution as well.

In the dev and test environments I'd start out with SQLite (default), and perhaps migrate to your target DB in the test environment as you move closer to completion.

Jamal Hansen
  • 954
  • 7
  • 16