-25

Could storing wildcard strings in a table's column (to be used as the second operand of the LIKE operator in queries) cause any non-obvious behavior? I'm wondering especially about the possibility of unexpected query results or security issues.

Here's an example usage I'm wondering about:

Example table:

| ID        | String              |
|-----------|---------------------|
| 1         | A__XX____5__________|
| 2         | A__XX____6__________|
| 3         | A__YX____5__________|
| 4         | B__XX____5__________|
| 5         | A__XX____5__________|
| 6         | A__XX____7__________|
| 7         | A__YY____5__________|

Example query:

SELECT ID
FROM ExampleTable
WHERE 'AVVYXZZZZ5ABCDEFGHIJ' LIKE String;

Query result:

| ID        |
|-----------|
| 3         | 

Is this a valid and idiomatic way to use them? Are there any examples in some documentation or other reference material that uses SQL wildcards like this?

ruohola
  • 21,987
  • 6
  • 62
  • 97
  • 1
    Not really sure what you are asking. What kind of issues are you concerned with here? And depending on how you build your actual query you could have injection issues but the query you posted does not have that issue. – Sean Lange May 07 '19 at 16:33
  • 2
    It looks fine to me. – Cetin Basoz May 07 '19 at 16:33
  • 2
    I would expect that query to be on the slow side - does "performance issue" count in "issues"? – Brian May 07 '19 at 16:34
  • 8
    Being discussed at https://meta.stackoverflow.com/questions/384703/user-vandalizes-and-deletes-a-question-with-close-votes-then-repost-question?cb=1 – Cowthulhu May 07 '19 at 19:36
  • 2
    @Broman "any issues" is subjective based on the needs. A lower performance might be an issue. However, if there isn't a lot of data or the delay is acceptable, it might not be an issue. Not sure in this case but I've seen somewhat similar situations where the return result brought up *seemingly* unrelated items. For users, as they got a different presentation of the data and it wasn't intuitive that the search was performed correctly - as far as they were concerned, it seemed random results were included. Whether this kind of thing needs fixing (somehow) still varies. – VLAZ May 08 '19 at 13:35
  • 1
    @Broman that makes it **Too Broad** as well, in the end it is off-topic for many reasons (these are just 2 mentioned so far), the specific one that *is best* for it to be closed under is irrelevant and opinion based, further justifying its closure even more. Arguing to keep things open like this does not help the site. –  May 08 '19 at 13:56
  • 3
    @JarrodRoberson I completely agree that it is to broad – klutt May 08 '19 at 13:57
  • 1
    @Fleshy Read the other comments and the Meta thread. People have already explained their opinions. –  May 15 '19 at 13:02

3 Answers3

15

The security flaw would -- conceivably -- arise if user input is put directly into the table without validation and users are limited to what they can see.

That is, if '%' could allow someone to see data they shouldn't.

However, using a column name for the like pattern is not a SQL injection risk, in the sense that it cannot cause another command to "inadvertently" run. And if you are putting the patterns into the table for matching purposes, there is no additional risk.

There might be a concern with performance, but that is another issue entirely.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
10

There are no basic security flaws inherent in this practice. However, you might need to parse or strictly control the format of the input strings so you don't end up with entries like these:

 | ID        | Identifier          |
 | --------- |---------------------|
 | 8         | A%                  |
 | 9         | %                   |

Also note that it is unlikely that an attacker would choose to look for flaws in a pattern usage like this, since it is very uncommon.

Issues are possible if a new pattern of data inadvertently matches an existing filter string, incorrectly returning the new entry for the old filter. However, good data formatting practices should be able to prevent any issues like this.

Laughing Vergil
  • 3,706
  • 1
  • 14
  • 28
  • 2
    I've worked with a similar 'mapping' approach where a 'priority' column would /handle/ overlapping masks by having the code pick the match with the highest priority. Performance was surprisingly well given the amount of string-operations the system had to go through, even for couple of hundred filters applied to 250k+ records; and that was when PIII's were considered top-notch =) – deroby May 09 '19 at 20:52
2

I have used this type of technique quite extensively and did not have any performance issues or security issues due to this. The use case is typically for storing some rules so the data set has only a few rows to deal with - so performance is never an issue. Use cases where this is quite useful is, for example:

Having mapping tables during integrations. So A1, A2, A3 on system 1 need to be sent as X to other system. Using wild cards one row can be used.

I always put a sort sequence# to resolve conflict and resolving to inadventent value becomes bug/feature rather than a security flaw.

Another varient I have used quite often -- where the engine needs to account for SQL Injtection type of attack -- is to go a step further and put in conditions. So store a full PL/SQL or any other interpretted language condition like @a = 'A' and @b = '2' ....

Preventing against SQL Injection is easy but it ends up saving a lot of code.

So back to main question - the technique will work out just fine.

Saad Ahmad
  • 393
  • 1
  • 7