Let's say we're using Django with a Postgres database.
I want to store a sequence of data like so:
- Record 1:
1, 2, 3, 4, 5
- Record 2:
7, 8, 2, 3, 1, 9, 6
- Record 3:
4, 4, 3, 2
A couple points to note:
- The sequence will always be one-dimensional.
- A sequence can have redundant values.
- A sequence can be of variable length.
So first thing; I want to store this information in the database. There are lots of ways I can accomplish this, so let's look into my querying requirements.
Let's say I have a query sequence 1, 2, 3
. Now I want to identify the sequences that match this sequence. A match would meet one of the following cases:
- Case A: The sequence contains the query
1, 2, 3
in that order.- In the example, Record 1 matches this.
- Case B: The sequence contains the components of the query
1, 2, 3
in any order.- In the example, Record 1 and 2 match this.
- Case C: The sequence contains some of the components of the query
1, 2, 3
in any order.- In the example, all records match this.
In a perfect world I'd like the results to be ranked such that:
- Record 1 comes first (because it matches Case A, our highest-priority match)
Record 2 comes second (because it matches Case B, our mid-priority match)
Record 3 comes last (because it matches Case C, our low-priority match)
Can anyone recommend a method, library, or concept for storing this data in such as way that queries can be made relatively fast?