1

I have a project that requires me to create tables on the fly. When tables are created it needs to support at least 1000 columns in each table. All INTs, Dates, BITs, will be indexes. So there could be about 400 indexes on one table. Once data is uploaded to the server no other inserts or updated will be performed on the table. I will use a library like lucene to index text.

My questions are:

  1. Can CUBRID handle 1000's of tables in one database?
  2. What is the performance of CUBRID when selecting on a table with a thousand columns and hundreds of indexes?
  3. Does CUBRID have a windows GUI interface to run ad hoc queries and manage tables
  4. Can DDL and DML statements be in the same transaction?

I am well aware of database normalization design issues here. I have fully researched normalization issues and arrived that creating many tables and columns is the best solution.

Luke101
  • 63,072
  • 85
  • 231
  • 359

1 Answers1

2

Disclaimer: I am a part of the CUBRID team.

Short answer:

  1. Yes.
  2. Depends.
  3. Yes.
  4. Yes.

Long answer:

  1. As mentioned at CUBRID RDBMS Size Limits, you can create unlimited number of tables with unlimited number of indexes per table as long as you have enough disk space.

    However, the number of columns per table is limited to 6400 which should be enough for you, I guess.

  2. It depends on the number of records you are going to have. It wouldn't matter too much if you didn't have so many columns in one row. The number of records will affect not only the data volume size (the file on the disk where the data is stored), but also the index volume size, where index keys and values are stored. With your current schema design where you have thousands of columns your index volume size, I suspect, will be significantly larger than your data volume size because every index will store copies of all values of the column you index.

    Creating a proper set of indexes will allow you to improve the performance.

    1. Check out Index Optimization in CUBRID to learn more about how you can optimize your queries.
    2. To further optimize your query, leverage Index Hints.


    But you should seriously consider normalizing your data.

    Also, your hardware configuration will significantly affect the performance of the server. For your case, large RAM and CPU is highly recommended.

    Any other RDBMS will face similar issues when working with such schema. CUBRID, in general, provides high SELECT performance with well defined indexes. So you should focus on this.

  3. There are three GUI tools you can use to administer your database.

    1. CUBRID Manager: an all-in-one, powerful database administration tool for CUBRID
    2. CUBRID Web Manager: a Web-based alternative for CUBRID Manager.
    3. CUBRID Query Browser: a light version of CUBRID Manager without host management and monitoring features.
  4. CUBRID complies with SQL92 standard. So you can define a table and insert records within the same transaction.

esengineer
  • 9,514
  • 7
  • 45
  • 69
  • I see. Thank you for you detailed explanation. It has really helped. I have another question. Since, several columns in a table may not be good for CUBID then - can CUBID handle several inner or left joins? I need to perform about 150 inner joins in one query. Is this feasible in CUBID – Luke101 Feb 19 '13 at 23:27
  • Several columns are fine, but several thousands of columns are not. There are no limitation on the number of JOINs in CUBRID. But the performance may suffer from so many JOINs. CUBRID can perform better if you can guarantee somehow that you will execute same queries over and over again. In this case, CUBRID will use [Query Plan Caching](http://www.cubrid.org/wiki_tutorials/entry/performance-implications-of-shared-query-plan-caching) to avoid recalculation of the query plan which will significantly improve the performance. – esengineer Feb 20 '13 at 05:32
  • Nevertheless, my personal opinion is that relational databases are not designed to handle such analytical tasks. After all CUBRID as well as other RDBMS are transactional (OLTP) systems and not analytical (OLAP). For this reason there are NoSQL solutions with MapReduce to crunch such schemaless large data. You may want to consider this as well. – esengineer Feb 20 '13 at 05:35