7

I have an application on a relational database that needs to change in order to keep more data. My problem is that just 2 of the tables will store more data(up to billions of entries) and one the tables is "linked" by fk to other tables. I could give up the relational model for these tables. I'd like to keep the rest of the db intact and changes only these 2 tables. I'm also doing a lot of queries - from simple selects to group by and subqueries - on these tables, so more problems there.

My experience with NoSQL is limited, so I'm asking which one (if any) of its siblings suits my needs: - huge data - complex queries - integration with a SQL database. This is not as important as the first two and I could migrate my entire db to an equivalent if it's worth it.

Thanks

qtm
  • 93
  • 5
  • 1
    The vast array of technologies commonly grouped as "NoSQL" aren't more siblings to each other than they are siblings to SQL databases. – Philipp Nov 26 '12 at 09:26

1 Answers1

3

Both relational databases and NoSQL approaches can handle data having billions of data points. With the supplied information, it is hard to make a meaningful and specific recommendation. It would be helpful to know more about what you are trying to do with the data, what your options are regarding your hardware and network topology, etc.

I assume since you are currently using a relational database, you have probably already looked at partitioning or otherwise structuring your larger tables so that your query performance is satisfactory. This activity by itself can be non-trivial, but IMHO, a good database design with optimized sql can take you a very long way before there is a clear need to explore alternatives.

However, if your data usage looks like write-once, read often, the join dependencies are manageable, and you need to perform some aggregations over the data set, then you might start to look into alternative approaches like Hadoop or MongoDB - however these choices come with trade-offs in terms of their performance, capabilities, platform requirements, latency, and so forth. Your particular question about integration between a NoSQL repository and a SQL database at the query level might not be realizable without some duplication of data between the two. For example, MongoDB does not like joins (http://stackoverflow.com/questions/4067197/mongodb-and-joins), so you must design your persistence model with that in mind, and this may involve duplication of data.

The point I am trying to make is - identifying the "right" approach will depend on your specific goal and constraints.

icey502
  • 132
  • 2
  • 5
  • Currently the app is running on MySQL. The database in on a single node, but it's possible to get up to 3 machines. The queries I'm running are complex and it's possible for them to involve all the rows in the table (for a big group by). From my experience, MySQL can't deal with that kind of operations in a reasonable time(less than 1min per query), that's why I'm searching for alternatives. – qtm Nov 26 '12 at 06:50
  • Instead of telling us very broad and ambiguous statements, why don't you give us a brief about what kind of data (for example, why would you want to do a one-big-group-by?) ... That does not seem logical... imho – Omar Ali Ibrahim Mar 04 '19 at 08:28