You might want to have a look at my previous question.
My database schema looks like this
--------------- ---------------
| candidate 1 | | candidate 2 |
--------------- \ --------------
/ \ |
------- -------- etc
|job 1| | job 2 |
------- ---------
/ \ / \
--------- --------- --------- --------
|company | | skills | |company | | skills |
--------- --------- ---------- ----------
Here's my database:
mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+----------------+
| job_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_id | int(11) | NO | MUL | NULL | |
| company_id | int(11) | NO | MUL | NULL | |
| start_date | date | NO | MUL | NULL | |
| end_date | date | NO | MUL | NULL | |
+--------------+---------+------+-----+---------+----------------+
.
mysql> describe candidates;
+----------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------+------+-----+---------+----------------+
| candidate_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_name | char(50) | NO | MUL | NULL | |
| home_city | char(50) | NO | MUL | NULL | |
+----------------+----------+------+-----+---------+----------------+
.
mysql> describe companies;
+-------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------+------+-----+---------+----------------+
| company_id | int(11) | NO | PRI | NULL | auto_increment |
| company_name | char(50) | NO | MUL | NULL | |
| company_city | char(50) | NO | MUL | NULL | |
| company_post_code | char(50) | NO | | NULL | |
| latitude | decimal(11,8) | NO | | NULL | |
| longitude | decimal(11,8) | NO | | NULL | |
+-------------------+---------------+------+-----+---------+----------------+
.
Note that I should probably call this skill_usage
, as it indicates when a skill was use don a job.
mysql> describe skills;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO | MUL | NULL | |
| job_id | int(11) | NO | MUL | NULL | |
+----------+---------+------+-----+---------+-------+
.
mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| skill_id | int(11) | NO | PRI | NULL | auto_increment |
| skill_name | char(32) | NO | MUL | NULL | |
+------------+----------+------+-----+---------+----------------+
So far, my MySQL query looks like this:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND sn.skill_id = s.skill_id
ORDER by can.candidate_id, j.job_id
I am getting output like this, but am not satisfied with it
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| candidate_id | candidate_name | candidate_city | job_id | company_id | start_date | end_date | skill_id |
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 2 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 2 | 2 | 2018-06-01 | 2019-01-31 | 3 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 4 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 5 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 6 |
| 1 | Pamela Brown | Cardiff | 4 | 3 | 2016-08-01 | 2017-11-30 | 1 |
| 2 | Christine Hill | Salisbury | 5 | 2 | 2018-02-01 | 2019-05-31 | 3 |
Now, I would like to restrict the search, by specifying "skill", like Python, C, C++, UML, etc and company names
The user will enter something like Python AND C++
into a skill search box (and/or Microsoft OR Google
into a company name search box).
How do I feed that into my query? Please bear in mind that each skill ID has a job Id associated with it. Maybe I first need to convert the skill names from the search (in this case Python C++
) into skill Ids? Even so, how do I include that in my query?
Te make a few things clearer:
- both the skills & company search box can be empty, which I will interpret as "return everything"
- search terms can include the keywords AND and OR, with grouping brackets (NOT is not required). I am happy enough to parse that in PHP & turn it into a MySQL query term (my difficulty is only with SQL, not PHP)
It looks like I made a start, with that INNER JOIN skills AS s ON s.job_id = j.job_id
, which I think will handle a search for a single skill, given its ... name ? ... Id?
I suppose my question would be how would that query look if, for example, I wanted to restrict the results to anyone who had worked at Microsoft OR Google
and has the skills Python AND C++
?
If I get an example for that, I can extrapolate, but, at this point, I am unsure whether I want more INNER JOINs or WHERE clauses.
I think that I want to extend that second last line AND sn.skill_id = s.skill_id
by paring the skills search string, in my example Python AND C++
and generating some SQL along the lines of AND (s.skill_id = X )
, where X is the skill Id for Python, BUT I don't know how to handle Python AND C++
, or something more complex, like Python AND (C OR C++)
...
Update
Just to be clear, the users are technical and expect to be able to enter complex searches. E.g for skills: (C AND kernel)OR (C++ AND realtime) OR (Doors AND (UML OR QT))
.
Final update
The requirements just changed. The person that I am coding this for just told me that if a candidate matches the skill search on any job that he ever worked, then I ought to return ALL jobs for that candidate.
That sounds counter-intuitive to me, but he swears that that is what he wants. I am not sure it can even be done in a single query (I am considering multiple queries; a first t get the candidates with matching skills, then a second to get all of their jobs).