-1

My Python application data structure is pure relational. My estimation for the biggest table is around 10 billion rows each year (all the other tables are very small). Each row size is about 20-30 bytes

What is the right database engine for me?

uriz
  • 567
  • 9
  • 25
  • What did you learn from googling your title & tags before posting? – philipxy Jun 25 '17 at 04:51
  • Possible duplicate of [Database choice for large data volume?](https://stackoverflow.com/questions/629445/database-choice-for-large-data-volume) – philipxy Jun 25 '17 at 04:54

2 Answers2

2

You might consider the following that I have used, but of course this will depend on what your data looks like and how your APP/Users need to interact with it. This is not an exhaustive list, it's only the stuff I have used.

  1. Greenplum database is a open source distributed Postgres database. http://greenplum.org/ It scales nicely and supports pretty much all Postgres stuff except for full text indexing last I knew

  2. Apache Phoenix: An open source sql layer on top of Hadoop/HBase. It scales nicely, but the ecosystem is a bit complex (as Per Hadoop). Cloudera's Impala is similar. https://phoenix.apache.org/

  3. Oracle Partitioning (preferably on RAC). If you can afford the license, Oracle partitioning allows for sharding of your data in various ways. If you have it with RAC, that will also provide parallel query execution

  4. Just partition your data (on any RDBMS) and put the partitions on good disk

Those are the 4 ideas I have actually used, and remember, on good hardware, with some table partitioning, 10B rows isn't really all that much, so you might just need to get a better box[s] and hook it to a SAN with SSD of some kind over 10G network or better. ALso think about putting indexes on a separate disk from where the db files are, and always use SSD if you can afford it.

Anyway, HTH

MG

Mark Giaconia
  • 3,844
  • 5
  • 20
  • 42
0

At 30 bytes per row that's less than 300GB, which is a small database, well within the capabilities of Oracle or SQL Server Enterprise editions. You won't need Oracle RAC.

You'll need to pay attention to application design and indexing/partitioning. Query and storage optimization will have a greater impact on performance than the choice of DBMS will.

nvogel
  • 24,981
  • 1
  • 44
  • 82