In my database, I have a table for companies. This table has a field called tags
which would contain something like:
Furniture Retail E-commerce B2C Home & Furniture Consumer Discretionary Furniture British manufacturer retailer contemporary vintage furniture products Designs
What I want to be able to do is query these tags and return companies based on whether or not this field contains any of the keywords or phrases a user might enter.
For example, if a user wants to find a company whose tags contain the word Retail
, it will generate a query like this:
SELECT
company.domain,
company.company_name,
CONCAT_WS(
',',
company.business_sector,
company.tags
) AS 'tags',
GROUP_CONCAT(
employee.employee_id SEPARATOR ','
) AS 'employee_ids',
COUNT(employee.employee_id) AS 'employees'
FROM
company
INNER JOIN employee ON company.domain = employee.domain
WHERE
company.tags REGEXP '^Retail| Retail |Retail$'
OR company.business_sector LIKE '%Retail%'
AND company.domain NOT IN (
'@hotmail.com',
'@gmail.com',
'@aol.com'
)
GROUP BY
company.domain
This exact query returns 11424 results, which is great.
Now the part where it falls over, is when a user enters a keyword that SHOULD NOT be inside this field.
So let's say we don't want any Apparel
in there, it would generate this query:
SELECT
company.domain,
company.company_name,
CONCAT_WS(
',',
company.business_sector,
company.tags
) AS 'tags',
GROUP_CONCAT(
employee.employee_id SEPARATOR ','
) AS 'employee_ids',
COUNT(employee.employee_id) AS 'employees'
FROM
company
INNER JOIN employee ON company.domain = employee.domain
WHERE
company.tags REGEXP '^Retail| Retail |Retail$'
OR company.business_sector LIKE '%Retail%'
AND (
company.tags NOT REGEXP '^Apparel| Apparel |Apparel$'
AND company.business_sector NOT LIKE '%Apparel%'
)
AND company.domain NOT IN (
'@hotmail.com',
'@gmail.com',
'@aol.com'
)
GROUP BY
company.domain
This exact query returns 112 results, which should definitely not be the case as there are not 11312 companies with the keyword Apparel
in my database.
Any ideas on what I'm doing wrong
EDIT
This is not a duplicate... I can modify my query, but that's not where the problem lays.
For example, let's take those 11424 results from Retail
and put in a random phrase that we know will NEVER be in ANY of the results, we should get the same 11424 records:
SELECT
company.domain,
company.company_name,
CONCAT_WS(
',',
company.business_sector,
company.tags
) AS 'tags',
GROUP_CONCAT(
employee.employee_id SEPARATOR ','
) AS 'employee_ids',
COUNT(employee.employee_id) AS 'employees'
FROM
company
INNER JOIN employee ON company.domain = employee.domain
WHERE
(
company.tags REGEXP '^Retail| Retail |Retail$'
OR company.business_sector LIKE '%Retail%'
)
AND (
company.tags NOT REGEXP '^This phrase will never occur| This phrase will never occur |This phrase will never occur$'
AND company.business_sector NOT LIKE '%This phrase will never occur%'
)
AND company.domain NOT IN (
'@hotmail.com',
'@gmail.com',
'@aol.com'
)
GROUP BY
company.domain
Instead of getting 11424, I get 135 records from this. How?