I have experiment data to store. The experiment can be run with varying settings, let's say setting a and setting b which can be set to varying values. The experiment is run 100.000 times per setting. How to optimize my table for this data structure?
Naïve implementation is a table with column names setting a
, setting b
and result of experiment
and to add the data per row. Setting a
could for instance be set to values (1, 2, 3)
and setting b
to (0.1, 0.01)
:
(_ROWID_), setting a, setting b, result
(0,) 1 0.1 res_1
(1,) 1 0.1 res_2
(2,) 1 0.1 res_3
...
(n,) 2 0.1 res_n
(n+1,) 2 0.1 res_n+1
...
(k,) 3 0.1 res_k
(k+1,) 3 0.1 res_k+1
...
(l,) 1 0.01 res_l
(l+1,) 1 0.01 res_l+1
... etc.
Indices l > k > n > 0
are primary key auto-incrementing row id's. Sub optimal, but easy to implement. I found data retrieval is slow. This implementation is neither good for memory (since setting a
and setting b
are duplicated many times so should be pointers to a set of results), nor good for search performance (table is unordered, while searches should return ordered data).
Another option is a child table for each set of setting a
and setting b
(named table_a_b
) and a parent table pointing to each child table; a "table of contents". Then result of experiment
can be stored without having to reference setting a
and setting b
. This makes queries more difficult however.
Or clustered indexes, which form a b-tree which does a similar job, if I understand correctly.
What are benefits and drawbacks of the three mentioned options for use case I described?