-1

I have a string S = "1-2-3-4-5-6-7-8"

This is how my database table rows look like:

id SubSequence
1 1-2-4-5
2 1-3-4-5
3 2-5-7-8
4 5-8-9-10
5 6-7-10-11

and so on ...

I want to write a query that would update (in this example) only the first 3 rows because they're a subsequence of string S.

The current solution I have is to programmatically go thru each row, check if it's a subsequence, and update. But I'm wondering if there's a way to do it at the MySQL level for performance.

Update: I don't mind changing the way data is stored. For example, String S could be an array holding those numbers, and the "SubSequence" column can hold those numbers as an array.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
Prashant Ghimire
  • 4,890
  • 3
  • 35
  • 46
  • Not sure what my question was marked duplicate. The other question that's mentioned as being duplicate does not talk about the solution to my problem. – Prashant Ghimire Feb 06 '21 at 19:36
  • 1
    Are the pieces of the sequences always numeric? always separated by `-`? always in numerical order? What is the largest possible number? (I am checking to see if some "tricks" could be used to make this task possible.) – Rick James Feb 08 '21 at 19:28
  • @RickJames Yes. Basically the problem is that i have a set of numbers S, ( length of S > 5). All the combinations (5 items long, C(S, 5) ) of items in S is stored in the table as above. Now I want to updated rows that are subset of S. – Prashant Ghimire Feb 08 '21 at 20:31
  • Furthermore, I have multiple sets like S. Eg. S1, S2, S3 with corresponding list of combinations L1, L2, L3. Between these sets, some elements will repeat. Therefore, L1, L2, L3 will also have common elements between them. The goal is to find out how many times each unique combination occur across all sets/combinations. Let me know if that makes sense. – Prashant Ghimire Feb 08 '21 at 20:37
  • I don't grok the S vs L. Perhaps my answer involves two columns (S, L) or two bitstrings (like 0x1FE)? – Rick James Feb 08 '21 at 23:49

2 Answers2

1

No, there is not a way to do the query you describe with good performance in SQL when you store the subsequences as strings like you have done. The reason is that doing substring comparisons cannot be optimized with indexes, so your query will be forced to do the comparisons row by row.

In general, when you try to store sets of values as a string, but you want to use SQL to treat them as discrete values, it's bound to be awkward, difficult to code, and ultimately have bad performance.

In this case, what I would do is make a two tables, one that numbers your entities, and a second table in which each value in your subsequence is stored on a row by itself.

SubSequences:

id
1
2

SubSequenceElements:

id SubSequenceElement
1 1
1 2
1 4
1 5
2 1
2 3
2 4
2 5

And so on.

Then you can use techniques to find cases where every element of this set exists in the set you want to compare it to.

Here's an example:

SELECT s.id
FROM SubSequences AS s
LEFT OUTER JOIN (
    SELECT id
    FROM SubSequenceElements
    WHERE SubSequenceElement NOT IN (1,2,3,4,5,6,7,8)
  ) AS invalid USING (id)
WHERE invalid.id IS NULL;

In other words, you want to return rows from SubSequences such that no match is found in SubSequenceElements with an element value that is not in the set you're trying to match.

It's a bit confusing, because you have to think about the problem is a double-don't-match-this-set problem. But once you get relational division, it can be very powerful.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

If the set can be represented by the numbers 0 through 63 (or some subset of that), then...

Using a column like this

elements BIGINT UNSIGNED NOT NULL DEFAULT '0'

Then "2-5-7-8" could be put into it thus:

UPDATE ...
    SET elements = (1<<2) | (1<<5) | (1<<7) | (1<<8);

Then various operations can be done in a single expression:

WHERE elements = (1<<2) | (1<<5) | (1<<7) | (1<<8)  -- Test for exactly that set

WHERE (elements ^ ~ ( (1<<2) | (1<<5) | (1<<7) | (1<<8) )) != 0
    -- checks to see if any other bits are turned on

This last example is close to what you need. One side of the "and not" would have the 1..8 of your example, the other would have

Your example has S represented as 0x1FE;

WHERE subsequence & ~0x1FE

will be 0 (false) for ids 1,2,3; non-zero (true) for ids 4 and 5.

Rick James
  • 135,179
  • 13
  • 127
  • 222