-3

I am coding an app for a friend, and SQL is not my strong suit. I thought that I had laid this matter to rest with my previous question, which received an excellent answer.

However, my friend has moved the goal posts yet again (and swears that it is final this time).

Given these tables

mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| skill_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| skill_name | char(32) | NO   | MUL | NULL    |                |
+------------+----------+------+-----+---------+----------------+

mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field        | Type    | Null | Key | Default | Extra          |
+--------------+---------+------+-----+---------+----------------+
| job_id       | int(11) | NO   | PRI | NULL    | auto_increment |
| candidate_id | int(11) | NO   | MUL | NULL    |                |
| company_id   | int(11) | NO   | MUL | NULL    |                |
| start_date   | date    | NO   | MUL | NULL    |                |
| end_date     | date    | NO   | MUL | NULL    |                |
+--------------+---------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

The user inputs a text for skill search.

Some examples might be

  • C
  • C or C++
  • C++ AND UML
  • (C AND kernel) OR (C++ AND UML)

And I want to parse that in PHP and generate an appropriate SQL query. Something along the lines of the answer to my previous question, but it will need to be tweaked.

Previously, I had believed that if the user input two skills and ANDed them e.g C++ and UML, then I should return details of any job where BOTH skills were used.

Now, he says that he just wants candidates who have used both of those skills (hence the AND), but not necessarily on the same job. To clarify, my PHP API will return an array of candidates, each entry of which will have an array of jobs.

I want to tweak the code in the answer to my previous question to achieve this. I wondered if I would probably end up doing the whole thing in PHP, with a bunch of nested FOR loops, rather than offloading it onto the SQL engine.

Update

I can't find a good fiddle site that allows both PHP and MySQL.

If anyone wants to test their own code on localhost, the main parts of the test database look like this:

Test search 3.1.5 (Python AND UML) OR (C++ OR UML)

Candidate name  Company Job year    Skills     Overall match
One             Thales      2015    C           No
One             BAe         2016    Python      No
One             Google      2017    C++         No
Two             BAe         2015    C++         Yes
Two             Google      2020    Python      Yes
Two             Google      2011    C++, UML    Yes
Three           Thales      2019    Python, UML Yes

and here is an SQL script to create the test database:

CREATE TABLE `candidates` (
  `candidate_id` INT(11) NOT NULL AUTO_INCREMENT,
  `candidate_name` CHAR(50),
  `candidate_city` CHAR(50),
  `latitude` DECIMAL(11,8),
  `longitude` DECIMAL(11,8),
     PRIMARY KEY (candidate_id));

CREATE TABLE `companies` (
  `company_id` INT(11) NOT NULL AUTO_INCREMENT,
  `company_name` CHAR(50) NOT NULL,
  `company_city` CHAR(50) NOT NULL,
  `company_post_code` CHAR(50) NOT NULL,
  `latitude` DECIMAL(11,8) NOT NULL,
  `longitude` DECIMAL(11,8) NOT NULL,
     PRIMARY KEY (company_id));

CREATE TABLE `jobs` (
  `job_id` INT(11) NOT NULL AUTO_INCREMENT,
  `candidate_id` INT(11) NOT NULL,
  `company_id` INT(11) NOT NULL,
  `start_date` DATE NOT NULL,
  `end_date` DATE NOT NULL,
     PRIMARY KEY (job_id));


CREATE TABLE `skill_names` (
  `skill_id` INT(11) NOT NULL AUTO_INCREMENT,
  `skill_name` CHAR(32) NOT NULL,
     PRIMARY KEY (skill_id));


CREATE TABLE `skill_usage` (
  `skill_id` INT(11) NOT NULL,
  `job_id` INT(11) NOT NULL);

INSERT INTO `skill_names` (skill_name) VALUES("C");
INSERT INTO `skill_names` (skill_name) VALUES("Python");
INSERT INTO `skill_names` (skill_name) VALUES("C++");
INSERT INTO `skill_names` (skill_name) VALUES("UML");

INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("One",   "Hastings",   50.8543, 0.5735);
INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("Two",   "Slough",  51.5105, 0.5950);
INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("Three", "Stonehenge", 51.1789, -1.8262);

INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("Thales", "Crawley",   "AB1 1CD", 51.1091, -0.1872);
INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("BAe",    "Rochester", "EF1 2GH", 51.3880, 0.5067);
INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("Google", "East Ham",  "E6  0XX", 51.5334, 0.0499);

INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 1, "2015-01-010", "2015-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 2, "2016-01-010", "2016-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 3, "2017-01-010", "2017-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 2, "2015-01-010", "2015-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 3, "2020-01-010", "2020-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 3, "2011-01-010", "2011-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(3, 1, "2019-01-010", "2019-12-31");

INSERT INTO `skill_usage` (job_id, skill_id) VALUES(1, 1);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(2, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(3, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(4, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(5, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(6, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(6, 4);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(7, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES (7, 4);
halfer
  • 19,824
  • 17
  • 99
  • 186
Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551
  • I’m voting to close this question because Trying to lurch without any effort. – wildplasser Aug 25 '20 at 21:47
  • Well, I put a lot of effort into the test database, which I have now attached (after the question was closed, alas). Non-visible effort also included paying for a Udemy SQL course, but this really isn't my bailiwick, so S.O is my only hope. Any chance of reopening? I will try to hack together some code,and if it gets close, maybe someone could improve it. – Mawg says reinstate Monica Aug 26 '20 at 06:27
  • Just as a side note: bounties are to help reward an existing good answer, or to help an answer receive more views, or to encourage readers to tackle a difficult question. Fortunately (or unfortunately depending on your perspective) the question must still be on-topic. This means that it must not be a work order where nothing has been tried (and I suspect that was the reason for the closure in this case). I have tried to amend the language a bit to this end (I will leave it to voters to decide if this is enough). – halfer Aug 27 '20 at 08:20
  • In relation to a possible PHP solution - did you try that? Maybe that would solve the problem, and it may even be a more elegant solution. – halfer Aug 27 '20 at 08:21
  • 2
    My answer to the original question in the link can easily be modified, with the generated part as a scalar correlated subquery. And again: just dumping the DDL and asking for the SQL is not a question. SO is not a code writing service. – wildplasser Aug 27 '20 at 09:47
  • 1
    Possible duplicate of [Parsing a boolean expression into a MySql query in PHP](https://stackoverflow.com/questions/63418184/parsing-a-boolean-expression-into-a-mysql-query-in-php) – yivi Sep 01 '20 at 09:26
  • 1
    That was my previous question, of which this is part 2. I even link to it above – Mawg says reinstate Monica Sep 01 '20 at 12:34

1 Answers1

2

Example of working query in DB fiddle: https://www.db-fiddle.com/f/rQKazPgbtGS766WEiuiXyR/0

$string = '(C AND kernel) OR (C++ AND UML)';

//Adding spaces between parentheses to accept inputs without that space as well
$string = str_replace('(', ' ( ', $string);
$string = str_replace(')', ' ) ', $string);

//Splitting input into separate strings, taken from previous question
$tokens = preg_split('/[\s]+/', $string);

$query = '';

$args = [];
$n = 0;
//Transcripting all strings separately and each skill name into FIND_IN_SET function
foreach ($tokens as $tok) {
    switch ($tok) {
        case '':  # skip empty tokens
        case ';':  # No, you should not!
        case '"':
        case "'":
        case ';':
            break;
        case '(':
            $query .= '(';
            break;
        case ')':
            $query .= ')';
            break;
        case '&':
        case 'AND':
            $query .= ' AND ';
            break;
        case '|':
        case 'OR':
            $query .= ' OR ';
            break;
        case '!':
        case 'NOT':
            $query .= ' NOT ';
            break;
        default:
            $arg = 'arg_' . $n;
            $args[$arg] = $tok;
            $query .= "FIND_IN_SET(':{$arg}', skills)";
            $n++;
            break;
    }
}

//Query grouping all of the skills used by candidates returning candidates
//with grouped column skills with HAVING clause containing all of the
//conditions they must match. So query returns only candidates who have ever
//used desired skills.
$sql = "SELECT candidate_id, GROUP_CONCAT(DISTINCT skill_names.skill_name) as skills
FROM jobs
LEFT JOIN skill_usage ON skill_usage.job_id = jobs.job_id
LEFT JOIN skill_names ON skill_names.skill_id = skill_usage.skill_id
GROUP BY candidate_id
HAVING {$query}
ORDER BY candidate_id DESC";

Example output for (C AND kernel) OR (C++ AND UML):

SELECT candidate_id, GROUP_CONCAT(DISTINCT skill_names.skill_name) as skills
FROM jobs
    LEFT JOIN skill_usage ON skill_usage.job_id = jobs.job_id
    LEFT JOIN skill_names ON skill_names.skill_id = skill_usage.skill_id
GROUP BY candidate_id
HAVING (FIND_IN_SET('C', skills) AND FIND_IN_SET('kernel', skills)) OR (FIND_IN_SET('C++', skills) AND FIND_IN_SET('UML', skills))
ORDER BY candidate_id DESC

Array of arguments:

Array
(
    [arg_0] => C
    [arg_1] => kernel
    [arg_2] => C++
    [arg_3] => UML
)
Jsowa
  • 9,104
  • 5
  • 56
  • 60
  • 4
    I already retracted my downvote. Still, your answer looks incomplete (what is find_in_set?) Plus: your query appears to be sensitive for SQL-injection. – wildplasser Aug 26 '20 at 12:30
  • 3
    `FIND_IN_SET()` is mysql function. Code works and retrieves what @MawgsaysreinstateMonica wanted. I don't understand people downvoting only because they don't know specific mysql function or sql injection. Of course it's sensitive because that's not the purpose of the question. I don't even know what ORM system is used. – Jsowa Aug 26 '20 at 12:52
  • 1
    @Tajni: It is the meta-effect, your answer was linked from here: https://meta.stackoverflow.com/questions/400823/question-with-working-answer-closed-i-want-to-award-a-bounty – user000001 Sep 03 '20 at 09:29