1

These are the only two tables that are germane. No need to bother you with the others.

mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| skill_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| skill_name | char(32) | NO   | MUL | NULL    |                |
+------------+----------+------+-----+---------+----------------+

Basically, users enter a boolean search string, using skill names.

I will convert the skill snames to skill_id, and then want to generate a MySql query to get all matching job_id from table skill_usage, by parsing the user's search string.

Strings can contain skill name, the operators AND and OR, plus brackets for precedence.

Some examples might be

  • C
  • C or C++
  • C++ AND UML
  • (C AND kernel) OR (C++ AND UML)

But there is no limit to the complexity of expression allowed - and that's my problem.

I'm no SQL guru, so correct me if I am wrong. I think that I want to start SELECT job_id FROM skill_usage then parse, and build up the rest of the query.

For the first example, just skill name C, I want to add WHERE skillId = X, where X is gotten from the table skill_names.

For a simple OR, like C OR C++, I can use an IN clause - WHERE skillId IN (X, Y) (again, X & Y are lookups of the skill names to get a skill_id).

For a simple AND, like C++ AND UML, I reckon I need an INNER JOIN, something like WHERE skill_id = X INNER JOIN skill_usage ON skill_usage.skill_id = Y (where X is the skill_id for C++ and Y for UML).

I think that is roughly correct, for those simple cases (?).

But, when I get to even slightly more complex cases like (C AND kernel) OR (C++ AND UML), I get confused.

Would a regex or an algorithm be appropriate here?

@AnthonyVallée-Dubois answer to this question looks like I might be able to modify it, but it also seems very complex. I am hoping to make something simpler, but am unsure how to start (the PHP coding is not my problem, just the regex or algorithm).

Update

I am trying to separate the parsing from the queries, and am using this question to sort out the queries.

I am getting answers like

SELECT job_id
FROM skill_usage
WHERE skill_id IN (3, 4)
GROUP BY job_id
HAVING MIN(skill_id) <> MAX(skill_id);

and

select s1.job_id
  from skill_usage s1
  where s1.skill_id = 3
    and s1.job_id in (
                       select s2.job_id
                         from skill_usage s2
                        where s2.skill_id = 4
                     )

Where the latter looks more extensible.

Whereas my pseudo-code for the PHP to convert a search string to an SQL query is roughly

fail if mis-matched brackets

reduce multiple spaces to single
removes spaces before and after closing/opening bracket  "( " & " )"

foreach c in string

   if c == (
   
   else
      if c === )
      
      else
         if AND
         
         else
           if OE
           
           else
              # it's a skill name
wildplasser
  • 43,142
  • 8
  • 66
  • 109
Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551
  • 1
    Why look up skill_name to get skill_id? Just use skill_name since you already have it – AbraCadaver Aug 14 '20 at 18:30
  • 1
    Also, I wouldn't let user enter that string. Use dropdowns or let them add to a multiple select and construct it yourself from that. In this way you could present them with skill_name associate skill_id in the form and use that if the query is easier. – AbraCadaver Aug 14 '20 at 18:33
  • how many jobs will you have? how many skills per job will there be, both average and max? how many different skills are there? – ysth Aug 14 '20 at 18:42
  • @ AbraCadaver I guess that you want me to add another `INNER JOIN` to convert `skill_name` to `skill_id`? As it is, I already have code to read that table once at startup and use it as a local cache in my PHP. I am not sure which would execute more quickly (might be worth profiling), but it certainly makes the SQL query less complex :-) – Mawg says reinstate Monica Aug 14 '20 at 19:01
  • Also, I think that it would be ***much*** more complex to add GUI as a visual query builder; espeically with enough comlexity. It seems to be easier to let the user just inout text. – Mawg says reinstate Monica Aug 14 '20 at 19:03
  • @ ysth I cannot predict what the data will be like. How many kills might there be in a recruiter's database? Hundreds, certainly, possibly thousands (which is another reason that I don't like the idea of a GUI).. Skills per job? I usually list 6 to 8, but you might be different. – Mawg says reinstate Monica Aug 14 '20 at 19:06
  • Thats same as post -> tags relation, and there are a lot of examples – flakerimi Aug 15 '20 at 10:11
  • Sorry, I don't understand what you mean? – Mawg says reinstate Monica Aug 15 '20 at 10:17
  • 1
    http://lekkerlogic.com/2016/02/site-tags-using-mysql-many-to-many-tags-schema-database-design/ – flakerimi Aug 15 '20 at 10:50
  • 1
    Here is what you need: https://stackoverflow.com/questions/9130284/mysql-many-to-many-relationship-query-how-to-get-all-tags-of-filtered-posts – flakerimi Aug 15 '20 at 10:54
  • LOlx! Thanx a 1,000,000. I wish that I had seen that 5 years ago when I was doing an event system where visitors wore RFID tags :-) – Mawg says reinstate Monica Aug 15 '20 at 12:24
  • 1
    Hint: dont JOIN, just emit `AND EXISTS(...from skill_usage WHERE ...)` for every skills-tag you encounter in the source string. `( ) OR AND NOT` can be emitted as is. – wildplasser Aug 16 '20 at 12:27

1 Answers1

2

Simple query generator, assuming PDO


        ## for simple tokenisation, the terms are separated by space here.
        ## ###############################################################
$string = "( C AND kernel ) OR ( C++ AND UML )";

function emit_term( $tag ) {
$res = " EXISTS (
                SELECT *
                FROM skill_usage su
                JOIN skill_names sn ON sn.skill_id = su.skill_id
                WHERE su.Job_id = j.job_id
                AND sn.skillname = :" . $tag . ")\n";
return $res;
}


$fixed_part ="
SELECT job_id, job_name
 FROM jobs j
 WHERE 1=1
 AND \n" ;


# $tokens = explode( ' ' , $string ); #splits on any single space
$tokens = preg_split( '/[\s]+/' , $string ); # accepts multiple whitespace
# print_r ( $tokens );

$query = $fixed_part;

$args = array();
$num = 1;
foreach ( $tokens as $tok ) {
        switch ($tok) {
        case '':  # skip empty tokens
        case ';':  # No, you should not!
        case '"':
        case "'":
        case ';':  break;
        case '(': $query .= '('; break;
        case ')': $query .= ')'; break;
        case '&':
        case 'AND': $query .= ' AND '; break;
        case '|':
        case 'OR': $query .= ' OR '; break;
        case '!':
        case 'NOT': $query .= ' NOT '; break;
        default:
                $tag = '_q' . $num ;
                $query .= emit_term ( $tag );
                $args[$tag] = $tok;
                $num += 1;
                 break;
                }
        }
$query .= ";\n\n";

echo "Query + parameters (for PDO):\n" ;
echo $query;
print_r ( $args) ;
          

Output:


SELECT job_id, job_name
 FROM jobs j
 WHERE 1=1
 AND 
( EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q1)
 AND  EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q2)
) OR ( EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q3)
 AND  EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q4)
);

Array
(
    [_q1] => C
    [_q2] => kernel
    [_q3] => C++
    [_q4] => UML
)

                     
wildplasser
  • 43,142
  • 8
  • 66
  • 109