5

So there's this new cool thing, these NoSQL-databases. And so there's my data: Rows of rows of rows of meteorological data: Values, representing certain measurements at a certain station (Identified by a WMO number, not coordinates), at a certain time.

Not every station measures every parameter, not every parameter is measured all the time.

I store this data (30 years worth of hourly values, resulting in ~1 billion values) currently in MySQL. The continous growth and the forseeable addition of even more data give me a little headache.

Reading about the document based NoSQL systems which seem to scale rather easily, I was wondering if NoSQL is a viable data storage concept for meteorological data too. Do you have any experience with this?

Update: Forgot about typical queries: Most of the queries need data in the temporal axis: I.e. give me the temperatures of station 066310 from 01.01.2010 00:00 to 01.03.2010 00:00.

Or: give me the most recent values of all parameters of a particular station.

Christian Studer
  • 24,947
  • 6
  • 46
  • 71
  • What exactly is giving you a headache? Management of the database? Performance? Aggregating the data? Something else? If its performance related, have you analysed the query plan for your queries - maybe you need better indexes, or to tune your database settings (PostgreSQL is great at this). How big is your dataset - disk wise. 1GB? More? Less? – Mike Apr 09 '10 at 08:27
  • Hard to tell without knowing all gory details about your table structure and specifics of your queries but you might gain a lot of (read) speed in a classic database by for example clustering your table on the date field (and providing appropriate indexes for your queries)... – ChristopheD Apr 09 '10 at 08:35
  • @Mike: The current database is around 30gb on disk, but future expansions will increase those to 100-300gb. Queries are analyzed and tables indexed accordingly. What gives us headaches is the general, well, size of things. Backups, replication restauration, bulk inserts with heavy indexing activities are all taking longer and longer. @ChristopheD: Clustering is definitly something we're looking into. – Christian Studer Apr 16 '10 at 12:31

3 Answers3

2

NoSQL could be a fit when your data structure is quite simple (for example a simple key-value store) / predictable and you have no need for relational integrity or a need for ad-hoc and/or advanced querying.

What you win in easy scalability you might lose in flexibility and consistency though.

The biggest problem would be to have an easy means for composing complex queries over your data. I would say meterological data is not the best candidate for NoSQL.

I personally prefer PostgreSQL over MySQL and find it very scalable (even with millions or even billions of rows) when setup correctly.

ChristopheD
  • 112,638
  • 29
  • 165
  • 179
  • This is not entirely correct. NoSQL can fit very complex data as well, think graph databases for example. Then there is also the simpler key-value NoSQL datastores. There is a very wide variety of NoSQL solutions. – ase Apr 09 '10 at 08:18
  • @adamse: good point about the broadness of the NoSQL term, although I think a graph database would not be the best fit for meterological data ;-) – ChristopheD Apr 09 '10 at 08:23
1

I think you should try with a full-featured and mature DBMS, before giving up with SQL.

See for instance:

http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/

http://www.yafla.com/dforbes/The_Impact_of_SSDs_on_Database_Performance_and_the_Performance_Paradox_of_Data_Explodification/

Marco Mariani
  • 13,556
  • 6
  • 39
  • 55
1

I find it hard to create a coherent answer right now, but here goes.

  1. Your data would fit without problem in a "nosql" datastore such as Cassandra (and many more probably)
  2. You would benefit from the schema-less design of many "nosql" solutions (seeing as not all columns (to use a MySQL term) are present all the time)
  3. The time based queries would be no problem in Cassandra (check out TimeUUID based keys)
  4. You don't seem to be taking advantage of the relational part of MySQL, so you wouldn't be hurt that much when losing it
  5. Although you might be just fine with MySQL, since you're really not describing the kind of problems, are you really having any? (Just being interested is totally cool)
  6. Things like indexes and search are things you would have to implement manually in many nosql datastore, if this scares you perhaps stick with sql.

Thanks for listening ;)

ase
  • 13,231
  • 4
  • 34
  • 46