1

I have the values of 'foo', 'bar' and 'buz'. I'd like to select rows 1 and 3 from the following table:

+----+------+------+------+
| Id | Col1 | Col2 | Col3 |
+----+------+------+------+
|  1 | foo  |      |      |
|  2 | foo  | bar  | buz  |
|  3 |      | bar  | buz  |
+----+------+------+------+

What would be the optimal MySQL query for that considering the following:

  • empty spaces are null
  • I have n columns where 4 < n < 20
  • there will be couple hundred rows
  • I will be selecting particular combinations of columns in a row where only provided values can be set and the rest needs to be null. E.g.: I need a query of this kind

SELECT * FROM my_table WHERE Col1 = foo OR (Col2 = bar AND Col3 = buz);

that'll return only rows 1 and 3.

In other words I'd like to have a query that returns rows only with certain combinations of values, not the rows where coln = 'some_val' and we dont' care about the others.

The reason for this is that I'd like to have 5 combinations in 1 query and I'd like to avoid writing (...) coln <> NULL(...) everywhere.

I'd appreciate your suggestions.

Alan
  • 1,322
  • 1
  • 21
  • 36
  • You want to have a query that returns a records which have at least on entry in one of the three columns col1, col2, col3? – timbmg Feb 25 '15 at 11:17
  • can you explain your last bullet-point a bit more? – Chris Feb 25 '15 at 11:26
  • @blckbird, Chris: I've edited my question – Alan Feb 25 '15 at 11:37
  • Could you please give a few more examples? In particular, which, if any, of the following rows would you like returned: `(4, foo, bar, NULL)`, `(5, xyzzy, NULL, NULL)`, `(6, bar, NULL, NULL)`, `(7, NULL, NULL, NULL)`? – Ilmari Karonen Feb 25 '15 at 11:40
  • @Ilmari Karonen I would like the db to assume, that for every field not provided in a query it needs to be NULL for the whole row to be returned. Even if provided values match. – Alan Feb 25 '15 at 11:51

2 Answers2

2

You would use where clause. To find the three values:

where 'foo' in (col1, col2, col3, . . . ) or
      'bar' in (col1, col2, col3, . . . ) or
      'buz' in (col1, col2, col3, . . . )

Limiting the results to those three values is more difficult. You need to do a comparison on each column:

where ( (col1 in ('foo', 'bar', 'buz') or col1 is null) or
        (col2 in ('foo', 'bar', 'buz') or col2 is null) or
        . . . 
      )

You can use a spreadsheet to generate the code, if you don't want to type it all in.

Another alternative would be the following kludge:

where replace(replace(replace(concat_ws('', col1, col2, col3, . . . ),
                              'foo', ''
                             ), 'bar', ''
                      ), 'buz', ''
             ) = ''

This concatenates the values together and then removes the ones you care about. If everything is removed, then the row matches.

As a note: this problem suggests that you have a poor data structure. You are storing things in columns that should be in rows.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks Gordon, I'm going through your answer. Just a comment on the last sentence. I could store these details in separate tables but then I'd have x5 db queries in my script. Also, there will be an 'infinite' number of rows and finite number of columns. – Alan Feb 25 '15 at 11:40
  • @Alan . . . You would store the values in a single table, with one row per column. That is, you would use a normalized data format. – Gordon Linoff Feb 25 '15 at 12:09
  • . . I am not sure if serialization is the best option here. If I serialize this data I will have to do `SELECT *` each time. With this structure and `WHERE` clause I will be selecting max 5 results from couple hundred rows a query. Unless it's more efficient to do it that way.... Thanks for the suggestion, I'll have to research it. – Alan Feb 25 '15 at 14:24
1

I would like the db to assume, that for every field not provided in a query it needs to be NULL for the whole row to be returned. Even if provided values match.

In that case, the trivial solution would be:

SELECT * FROM my_table WHERE
  (Col1 = 'foo' AND Col2 IS NULL AND Col3 IS NULL) OR
  (Col1 IS NULL AND Col2 = 'bar' AND Col3 = 'buz') OR
  /* ... */

In many cases, this can be a perfectly good solution. However, it can sometimes suffer suboptimal performance due to MySQL's poor index use for OR queries. In particular, if you have a combined index on all the columns, and want to make full use of it, it can be better to rewrite the query to use UNION (or UNION ALL) instead of OR:

SELECT * FROM my_table WHERE
  (Col1 = 'foo' AND Col2 IS NULL AND Col3 IS NULL) UNION ALL
SELECT * FROM my_table WHERE
  (Col1 IS NULL AND Col2 = 'bar' AND Col3 = 'buz') UNION ALL
  /* ... */

See this SQLFiddle for a demonstration of the two techniques, and compare the execution plans (the execution times are rather meaningless with such a small table; you'd need many more rows to see any systematic difference). Looking at the KEY_LEN and REF columns, you can see that, for the OR query, MySQL ends up using the index only for Col1, whereas with UNION ALL, it can use the whole index.

Community
  • 1
  • 1
Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153