1

You might want to have a look at my previous question.

My database schema looks like this

         ---------------                              ---------------   
         | candidate 1 |                              | candidate 2 |
         --------------- \                             --------------      
           /              \                                 |
       -------              --------                        etc
       |job 1|              | job 2 |  
       -------              ---------  
        /     \              /      \  
  ---------   ---------  ---------   --------  
  |company |  | skills | |company | | skills |  
  ---------   ---------  ---------- ----------  

Here's my database:

mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field        | Type    | Null | Key | Default | Extra          |
+--------------+---------+------+-----+---------+----------------+
| job_id       | int(11) | NO   | PRI | NULL    | auto_increment |
| candidate_id | int(11) | NO   | MUL | NULL    |                |
| company_id   | int(11) | NO   | MUL | NULL    |                |
| start_date   | date    | NO   | MUL | NULL    |                |
| end_date     | date    | NO   | MUL | NULL    |                |
+--------------+---------+------+-----+---------+----------------+

.

mysql> describe candidates;
+----------------+----------+------+-----+---------+----------------+
| Field          | Type     | Null | Key | Default | Extra          |
+----------------+----------+------+-----+---------+----------------+
| candidate_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| candidate_name | char(50) | NO   | MUL | NULL    |                |
| home_city      | char(50) | NO   | MUL | NULL    |                |
+----------------+----------+------+-----+---------+----------------+

.

mysql> describe companies;
+-------------------+---------------+------+-----+---------+----------------+

| Field             | Type          | Null | Key | Default | Extra          |
+-------------------+---------------+------+-----+---------+----------------+
| company_id        | int(11)       | NO   | PRI | NULL    | auto_increment |
| company_name      | char(50)      | NO   | MUL | NULL    |                |
| company_city      | char(50)      | NO   | MUL | NULL    |                |
| company_post_code | char(50)      | NO   |     | NULL    |                |
| latitude          | decimal(11,8) | NO   |     | NULL    |                |
| longitude         | decimal(11,8) | NO   |     | NULL    |                |
+-------------------+---------------+------+-----+---------+----------------+

.

Note that I should probably call this skill_usage, as it indicates when a skill was use don a job.

mysql> describe skills;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

.

mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| skill_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| skill_name | char(32) | NO   | MUL | NULL    |                |
+------------+----------+------+-----+---------+----------------+

So far, my MySQL query looks like this:

SELECT DISTINCT can.candidate_id, 
                can.candidate_name, 
                     can.candidate_city,        
                     j.job_id, 
                     j.company_id,
                DATE_FORMAT(j.start_date, "%b %Y")  AS start_date, 
                DATE_FORMAT(j.end_date, "%b %Y") AS end_date,        
                s.skill_id  
FROM  candidates AS can       
  INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id     
  INNER JOIN companies AS co ON j.company_id = co.company_id        
         INNER JOIN skills AS s ON s.job_id = j.job_id 
            INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id 
   AND sn.skill_id = s.skill_id 
ORDER by can.candidate_id, j.job_id

I am getting output like this, but am not satisfied with it

   +--------------+----------------+---------------------+--------+------------+------------+------------+----------+
   | candidate_id | candidate_name | candidate_city      | job_id | company_id | start_date | end_date   | skill_id |
   +--------------+----------------+---------------------+--------+------------+------------+------------+----------+
   |            1 | Pamela Brown   | Cardiff             |      1 |          3 | 2019-01-01 | 2019-08-31 |        1 |
   |            1 | Pamela Brown   | Cardiff             |      1 |          3 | 2019-01-01 | 2019-08-31 |        2 |
   |            1 | Pamela Brown   | Cardiff             |      1 |          3 | 2019-01-01 | 2019-08-31 |        1 |
   |            1 | Pamela Brown   | Cardiff             |      2 |          2 | 2018-06-01 | 2019-01-31 |        3 |
   |            1 | Pamela Brown   | Cardiff             |      3 |          1 | 2017-11-01 | 2018-06-30 |        4 |
   |            1 | Pamela Brown   | Cardiff             |      3 |          1 | 2017-11-01 | 2018-06-30 |        5 |
   |            1 | Pamela Brown   | Cardiff             |      3 |          1 | 2017-11-01 | 2018-06-30 |        6 |
   |            1 | Pamela Brown   | Cardiff             |      4 |          3 | 2016-08-01 | 2017-11-30 |        1 |
   |            2 | Christine Hill | Salisbury           |      5 |          2 | 2018-02-01 | 2019-05-31 |        3 |

Now, I would like to restrict the search, by specifying "skill", like Python, C, C++, UML, etc and company names

The user will enter something like Python AND C++ into a skill search box (and/or Microsoft OR Google into a company name search box).

How do I feed that into my query? Please bear in mind that each skill ID has a job Id associated with it. Maybe I first need to convert the skill names from the search (in this case Python C++) into skill Ids? Even so, how do I include that in my query?

Te make a few things clearer:

  • both the skills & company search box can be empty, which I will interpret as "return everything"
  • search terms can include the keywords AND and OR, with grouping brackets (NOT is not required). I am happy enough to parse that in PHP & turn it into a MySQL query term (my difficulty is only with SQL, not PHP)

It looks like I made a start, with that INNER JOIN skills AS s ON s.job_id = j.job_id, which I think will handle a search for a single skill, given its ... name ? ... Id?

I suppose my question would be how would that query look if, for example, I wanted to restrict the results to anyone who had worked at Microsoft OR Google and has the skills Python AND C++?

If I get an example for that, I can extrapolate, but, at this point, I am unsure whether I want more INNER JOINs or WHERE clauses.

I think that I want to extend that second last line AND sn.skill_id = s.skill_id by paring the skills search string, in my example Python AND C++ and generating some SQL along the lines of AND (s.skill_id = X ), where X is the skill Id for Python, BUT I don't know how to handle Python AND C++, or something more complex, like Python AND (C OR C++) ...

Update

Just to be clear, the users are technical and expect to be able to enter complex searches. E.g for skills: (C AND kernel)OR (C++ AND realtime) OR (Doors AND (UML OR QT)).

Final update

The requirements just changed. The person that I am coding this for just told me that if a candidate matches the skill search on any job that he ever worked, then I ought to return ALL jobs for that candidate.

That sounds counter-intuitive to me, but he swears that that is what he wants. I am not sure it can even be done in a single query (I am considering multiple queries; a first t get the candidates with matching skills, then a second to get all of their jobs).

halfer
  • 19,824
  • 17
  • 99
  • 186
Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551
  • 2
    What causes you a problem? Parsing the string `"python and c++"` to get only the keywords `"python"` and `"c++"` ? Finding results that matches **all** keywords ? – Cid Aug 05 '20 at 12:15
  • Parsing is trivial. I am pretty weak on MySql, though, and that's quite a complex query :-/ – Mawg says reinstate Monica Aug 05 '20 at 12:21
  • You'd need somethink like `select id, count(*) from tablename where fieldname IN (your criteria) group by id having count(*) = your number of criterias`. This could be used as subquery. Let me find a duplicate – Cid Aug 05 '20 at 12:26
  • https://stackoverflow.com/questions/4047484/selecting-with-multiple-where-conditions-on-same-column – Cid Aug 05 '20 at 12:28
  • 1
    Hey @MawgsaysreinstateMonica. You would need to provide sample data and desired results to clarify your question. It is not easy to retro-engineer the result you want from just your existing query and the descriptions of the tables. – GMB Aug 05 '20 at 17:43
  • Is the desired output that I added enough? Sample data can be inferred from the `DESCRIBE` queries. – Mawg says reinstate Monica Aug 06 '20 at 09:34
  • Your schema is not the same as the table definitions. – Nikkorian Aug 15 '20 at 10:45
  • 1
    You say "The user will enter something like Python AND C++ into a skill search box" - why not use a multi-select or check boxes, whatever, so each skill/company is a separate datum - i.e. normalise the input data and predefine the companies, skills etc so there is no chance of typos etc resulting in ambiguity – Nikkorian Aug 15 '20 at 10:54
  • A nice thought, but with hundreds, possibly thousands of companies and even more skills, it would be an ***extremely*** busy GUI. Either a screen full of extremely tiny check boxes, or some extremely long drop downs. Even then, who would I nest/group the boolean operators? I can't figure out how to develop a GUI to allow the user to enter `(C AND kernel) OR ((C++ AND UML) OR (assembler))` – Mawg says reinstate Monica Aug 15 '20 at 12:33
  • (Minor aside: named individuals can't be pinged by adding their `@` handle in a question. We discourage call-outs in posts anyway, since they are better written for a wide general audience). – halfer Aug 16 '20 at 12:02

2 Answers2

3

The first thing I'd say is that your original query probably needs an outer join on the skills table - as it stands, it only retrieves people whose job has a skill (which may not be all jobs). You say that "both the skills & company search box can be empty, which I will interpret as return everything" - this version of the query will not return everything.

Secondly, I'd rename your "skills" table to "job_skills", and your "skill_names" to "skills" - it's more consistent (your companies table is not called company_names).

The query you show has a duplication - AND sn.skill_id = s.skill_id duplicates the terms of your join. Is that intentional?

To answer your question: I would present the skills to your users in some kind of pre-defined list in your PHP, associated with a skill_id. You could have all skills listed with check boxes, or allow the user to start typing and use AJAX to search for skills matching the text. This solves a UI problem (what if the user tries to search for a skill that doesn't exist?), and makes the SQL slightly easier.

Your query then becomes:

SELECT DISTINCT can.candidate_id, 
                can.candidate_name, 
                can.candidate_city,        
                j.job_id, 
                j.company_id,
                DATE_FORMAT(j.start_date, "%b %Y")  AS start_date, 
                DATE_FORMAT(j.end_date, "%b %Y") AS end_date,        
                s.skill_id  
FROM  candidates AS can       
  INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id     
  INNER JOIN companies AS co ON j.company_id = co.company_id        
  INNER JOIN skills AS s ON s.job_id = j.job_id 
  INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id 
AND skill_id in (?, ?, ?)
OR skill_id in (?)
ORDER by can.candidate_id, j.job_id

You need to substitute the question marks for the input your users have entered. EDIT

The problem with allowing users to enter the skills as free text is that you then have to deal with case conversion, white space and typos. For instance, is "python " the same as "Python"? Your user probably intends it to be, but you can't do a simple comparison with skill_name. If you want to allow free text, one solution might be to add a "normalized" skill_name column in which you store the name in a consistent format (e.g. "all upper case, stripped of whitespace"), and you normalize your input values in the same way, then compare to that normalized column. In that case, the "in clause" becomes something like:

AND skill_id in (select skill_id from skill_name where skill_name_normalized in (?, ?, ?))

The boolean logic you mention - (C OR C++) AND (Agile) - gets pretty tricky. You end up writing a "visual query builder". You may want to Google this term - there are some good examples.

You've narrowed down your requirements somewhat (I may misunderstand). I believe your requirements are

I want to be able to specify zero or more filters.
A filter consists of one or more ANDed skill groups.
A skill group consists of one or more skills.
Filters are ORed together to create a query.

To make this concrete, let's use your example - (A and (B OR C)) OR (D AND (E OR F)). There are two filters: (A and (B OR C)) and (D AND (E OR F)). The first filter has two skill groups: A and (B OR C).

It's hard to explain the suggestion in text, but you could create a UI that allows users to specify individual "filters". Each "filter" would allow the user to specify one or more "in clauses", joined with an "and". You could then convert this into SQL - again, using your example, the SQL query becomes

SELECT DISTINCT can.candidate_id, 
                can.candidate_name, 
                can.candidate_city,        
                j.job_id, 
                j.company_id,
                DATE_FORMAT(j.start_date, "%b %Y")  AS start_date, 
                DATE_FORMAT(j.end_date, "%b %Y") AS end_date,        
                s.skill_id  
FROM  candidates AS can       
  INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id     
  INNER JOIN companies AS co ON j.company_id = co.company_id        
  INNER JOIN skills AS s ON s.job_id = j.job_id 
  INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id 
AND 
  (skill_id in (A) and skil_id in (B, C))
OR 
  (skill_id in (D) and skil_id in (E, F))
ORDER by can.candidate_id, j.job_id
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • "`it only retrieves people whose job has a skill (which may not be all jobs)`" .. well, it is my intention to show only those jobs where it used a skill, or those companies at which it was used - if the user specifies a skill(s) or company(s). If either skill or company are blank, treat them as wildcard. But, if I specify C++, don't show me those jobs where C was the coding language – Mawg says reinstate Monica Aug 10 '20 at 08:21
  • "`The query you show has a duplication - AND sn.skill_id = s.skill_id duplicates the terms of your join. Is that intentional?`". Nope, that is just me being an idiot – Mawg says reinstate Monica Aug 10 '20 at 08:21
  • Wah! I believe that the key here is in `IN` clause. Excellent. I will try this out & get back to you. Thanks! (although it could get hairy with things like `(A and (B OR C)) OR (D AND (E OR F)). I need to consider that, but fear that users may want full flexibility (and drop downs won't work with several hundred, possibly thousand, skills) – Mawg says reinstate Monica Aug 10 '20 at 08:23
  • Hmm, I can't see how to handle user input like ` `(A and (B OR C)) OR (D AND (E OR F))` for the skill. That's the blocker for me. I guess that I have to use PHP and parse it, but don't really see how. I suppose that I can convert the ORs to IN, and leave the AND, as is? But, my stumbling block is that candidate may have from one job and B from another, in which case I want to return both jobs. ANd, if he has both A nd B, plus one for the OR clause, I still want to show all jobs. I am not sure if I cam explaining clearly. – Mawg says reinstate Monica Aug 10 '20 at 14:03
1

Building a bit off previous comments and answers... if handling input like
(A and (B OR C)) OR (D AND (E OR F)) is the blocker you could try moving some of the conditional logic out of the joins and filter instead.

WHERE (
          ((sn.skill_id LIKE 'A') AND ((sn.skill_id LIKE ('B')) OR (sn.skill_id LIKE('C')))) 
       AND ((co.company_id IN (1,2,3)) AND ((can.city = 'Springfield') OR (j.city LIKE('Mordor'))))
     )

You can build your query string based off used input, search out Id's for selected values and put them into the string and conditionally build as many filters as you like. Think about setting up add_and_filter and add_or_filter functions to construct the <db>.<field> <CONDITION> <VALUE> statements.

$qs = "";
$qs .= "select val from table";
...
$qs .= " WHERE ";
if($userinput){ $qs += add_and_filter($userinput); }

alternately, look at a map/reduce pattern rather than trying to do it all in SQL?