0

The MySQL query below uses PHP to pull in the $sector, which is a single digit, and the $subsector_text, which is a comma separated string. The $subsector_text could be a single digit or a list of several IDs, such as "3,4,7,9".

  $sql = "
SELECT DISTINCT a.id
              , a.name
              , a.category_id
              , a.sector
              , a.subsector
              , a.year_in_operation
              , a.state
              , a.total_value
              , b.country_id
              , b.project_id
              , c.isocode_3
              , c.name
           FROM com_barchan_project a
           JOIN com_barchan_location b
             ON b.project_id = a.id
           JOIN com_barchan_country c
             ON c.id = b.country_id
           JOIN com_barchan_project_value_join d
             ON a.id = d.project_id
          WHERE a.state = 1 
            AND a.sector = '$sector'
            AND a.subsector REGEXP '^{$subsector_text}[,]|[,]{$subsector_text}[,]|[,]{$subsector_text}$|^{$subsector_text}$'
          ORDER 
             BY a.total_value DESC
              , a.category_id ASC
              , a.name ASC
";

The problem I'm having with the query above is with the line:

AND a.subsector REGEXP '^{$subsector_text}[,]|[,]{$subsector_text}[,]|[,]{$subsector_text}$|^{$subsector_text}$'  

If the $subsector_text = "3,4,5,9", then it's only returning records that contain exactly "3,4,5,9" in the $subsector field.

The desired result is that it would return any record that has any of the values in the $subsector_text. For instance, all these should be returned, but are currently not. This list is an example and by no means exact.

1,3
1,5
1,3,7,9
3,5
3,4,5,9
9
3
5
4

How do I change the query to select any records that has a value in that's in the $subsector_text string?

Please NOTE: That if the $subsector_text = 11, then the following, as an example, should not be selected.

1
12
21

Any help would be greatly appreciated.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 10
    I think you've discovered why using comma-separated values in a database is a super bad idea. Can you adjust your schema to make it more in line with relational database conventions? – tadman May 16 '17 at 18:04
  • Not without a re-write of various parts of the app. –  May 16 '17 at 18:09
  • 1
    You either tackle this problem now, even if it's a giant hassle, or you can try and pave it over with lots and lots and *lots* of code that really shouldn't be there in the first place. This is how insurmountable technical debt occurs: Delaying the inevitable. – tadman May 16 '17 at 18:10
  • Find a meaningful subject! "Very Difficult MySQL Query" doesn't say anything... – Thomas Landauer May 16 '17 at 18:11
  • 1
    the subject should be: 'help with query to solve horrible design' – Randy May 16 '17 at 18:12
  • Unfortunately, it's a design that I inherited. –  May 16 '17 at 18:14
  • @JohnLeger TBH, it might be a good idea to cover that whole thing with tests. Both unit test and integration tests (against SQLite). And then refactor it all. It might end up cheaper in long run. Comma-separated-values are basically the worst thing you can introduce in a DB. – tereško May 16 '17 at 18:29
  • To add to what @tadman says ... any code that you write to cover for this design flaw will most likely not be as efficient and scale-able as would a good relational schema be. If I were you I would re-factor the tables and the associated code especially if you inherited it. There is no reason to propagate and add more technical debt. – objectNotFound May 16 '17 at 18:42
  • 1
    Thanks everyone! I certainly agree that re-factoring is the best way to deal with this issue, but it's not always immediately possible. –  May 16 '17 at 19:07

3 Answers3

2

It's not practical to match any value in a comma-separate string against any value in another comma-separated string in a single predicate.

You can use FIND_IN_SET() to search for one value at a time.

This means you need multiple predicates, one for each value you get by splitting your input $subsector_text. So split your variable and map it into a series of FIND_IN_SET() calls.

I haven't tested the following code, but it should give you the idea of what I'm talking about:

$subsector_array = array_map('intval', explode(',', $subsector_text));
$subsector_terms = array_map(
  function ($id) { return "FIND_IN_SET($id, a.subsector)"; },
  $subsector_array);
$subsector_expr = implode(' OR ', $subsector_terms);

$sql = "
SELECT ...
          WHERE a.state = 1 
            AND a.sector = '$sector'
            AND ($subsector_expr)
...";

This will of course force a table-scan because there's no way to index FIND_IN_SET(), or any other operation that searches for substrings. Well, I suppose your conditions on a.state and a.sector will use an index to narrow down the search before applying the FIND_IN_SET() conditions.

I understand the dilemma of having to work with a system that you inherited. Let your manager know that this needs to get refactored at some point, because it will never be efficient or reliable the way it's designed now.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

Your approach is correct, but need some modifications. Instead of try to match in only one condition (REGEXP), can create multiple conditions joined with OR...

Example:

$subsectorArray = explode(',', $subsector_text);
$or = [];
foreach ($subsectorArray as $subsector){
    $or[] = "a.subsector REGEXP '[^[:alnum:]]{$subsector}[^[:alnum:]]|^{$subsector}[^[:alnum:]]|[^[:alnum:]]{$subsector}$|^{$subsector}$'";
}
$orStr = implode(' OR ', $or);

 $sql = "
SELECT DISTINCT a.id
              , a.name
              , a.category_id
              , a.sector
              , a.subsector
              , a.year_in_operation
              , a.state
              , a.total_value
              , b.country_id
              , b.project_id
              , c.isocode_3
              , c.name
           FROM com_barchan_project a
           JOIN com_barchan_location b
             ON b.project_id = a.id
           JOIN com_barchan_country c
             ON c.id = b.country_id
           JOIN com_barchan_project_value_join d
             ON a.id = d.project_id
          WHERE a.state = 1 
            AND a.sector = '$sector'
            AND ($orStr)
          ORDER 
             BY a.total_value DESC
              , a.category_id ASC
              , a.name ASC
";
rafrsr
  • 1,940
  • 3
  • 15
  • 31
  • Do or do not. There is no "try". A ***good answer*** will always have an explanation of what was done and why it was done in such a manner, not only for the OP but for future visitors to SO. – Jay Blanchard May 16 '17 at 21:43
0

The solution was to refactor the app. It took a couple days, but the offending code is gone and a new subsector table was created. Thanks everyone.