2

Suppose a database is defined with 1 million value-based partitions, but data is only imported into 10 partitions. In DolphinDB, actual partitions are generated only when data is imported.

So if there are many more partitions defined than the actual partitions containing data, how does this affect query speed?

molddd123
  • 297
  • 6

1 Answers1

1

If a certain partition range is specified in a where clause, it will have little impact on query performance. Otherwise, such as in the case of select count(*) from t, it will have a significant impact. Because all partitions need to be traversed when metadata is queried.

The difference can be seen in the following example, where two partitioning schemes are tested using the OLAP engine with a data volume of 47,723,974.

Scheme 1:

db1 = database(, VALUE, 2020.01.01..2021.01.01)
db2 = database(, HASH, [SYMBOL, 10])
db = database(dbName, COMPO, [db1, db2])

The time consumed by "select count(*)" is 22.582 ms.

Scheme 2:

db1 = database(, VALUE, 2000.01.01..2100.01.01)
db2 = database(, HASH, [SYMBOL, 100])
db = database(dbName, COMPO, [db1, db2])

The time consumed by "select count(*)" is 8623.119 ms.

zhihengqu
  • 371
  • 5