I'm working on constructing a database (SQLite) to store information about each run of a Mathematica script I've written. The script takes several input parameters, so my DB has a table with a column for each parameter (among other columns).
Some of the input parameters are lists of numbers. My first thought for storing these is to use a junction table as described in the accepted answer to this question. But I typically use the same list for several different runs. How can I look up whether any given list is already in the database, so I can reuse its ID rather than storing it again?
Constraints as mentioned in comments:
- There is no explicit upper bound on the length of a list but in practice it ranges from 1 to about 50.
- The number of distinct lists will be small, on the order of 10.
- I actually have 3 list parameters. For two of them, the values in the list are non-negative, double precision floating point numbers; for the third, the values are pairs of such numbers.
- There are no duplicate entries. (These are more precisely sets, so no duplicates and order is irrelevant)
- I can easily arrange for the list elements to be in sorted order.
For example: suppose my table is set up like this
CREATE TABLE jobs (id INTEGER PRIMARY KEY, param1 REAL, param2_id INTEGER);
CREATE TABLE param2 (param2_id INTEGER PRIMARY KEY, value REAL);
When I run the script, it sets the parameters and then calls a function to run the calculation, like so:
param1 = 4;
param2 = {.1, .3, .5};
runTheCalculation[param1, param2]
Assuming this is the very first run of the script, it will insert the following contents into the DB:
jobs: id param1 param2_id
1 4.0 1
param2: param2_id value
1 0.1
1 0.3
1 0.5
So far, so good. Now let's say I run the script again with one different parameter,
param1 = 2;
param2 = {.1, .3, .5};
runTheCalculation[]
In a naive implementation, this will result in the database containing this:
jobs: id param1 param2_id
1 4.0 1
2 2.0 2
param2: param2_id value
1 0.1
1 0.3
1 0.5
2 0.1
2 0.3
2 0.5
But I would like it to be able to look up the fact that the list {.1, .3, .5}
is already in the database, so that after the second run the DB contains this instead:
jobs: id param1 param2_id
1 4.0 1
2 2.0 1
param2: param2_id value
1 0.1
1 0.3
1 0.5
What sort of a query can I use to find that the list {.1, .3, .5}
already exists in the table param2
?
I'm not opposed to creating additional tables if necessary. Or if there is some model other than using a junction table that makes more sense, that's fine too.