SQL can only (without being "clever" using stored procedure etc) ever return a fixed number of columns.
(FYI - the answer you link to, IMHO is being "clever". It's doing some really in-obvious things with session variables etc, which if they don't hurt performance are hurting readability - so I'm going to try and answer your question from a different angle.)
You could therefore fix (hardcode) the breadcrumb "depth", and with a fixed number of JOIN statements everything becomes very simple.
I'm assuming the breadcrumb depths are intended to be anything between 1 and infinity? i.e. another "collection" of items may be filed under a smaller depth of categories?
This being the case, your GROUP_CONCAT
is probably one part of the solution, because it emulates "variable column counts" in SQL. (It returns as 1 column, but can contain a flexible number of delimited values inside.)
Your problem will be that still the nature of SQL can only join one table to another (single) table per JOIN
statement. Your breadcrumb data structure is well normalised, and assumes a join per each sub-category its parent.
You could try to dynamically build the SQL - but that's probably going to burn you. You're probably left with two "obvious" options:
- Stored procedure. (Go beyond basic SQL.)
- Change your schema. (Solve the problem when the data is stored, rather than retrieved.)
Stored procedures could solve this in a number of ways - one obvious option is to iterate through building each breadcrumb procedurally, storing reach in a temporary table, then finally selecting the entire temporary table.
I'm happy to give you guidance on this but I won't do it as part of this answer (unless requested to) because I'm fairly sure the performance will be sufficiently bad that you ultimately won't want to use it.
The other "major" option then is to restructure the schema. In this scenario the level of normalisation you've achieved is making things unduly complex. It's good "academically" and it's good for disk space. But it's not lending itself to solving your problem very well!
Denormalising has one other major trade-off. You'll have more complexity when changing the data in the schema. I'd recommend starting off by writing a routine which "rebuilds" the data (if you take this approach), because otherwise things will get out of sync and you'll spend forever trying to work out what went wrong. (I speak from experience.)
For every matching record (you're comparing user input to CategoryName
) you want to return and be able to group by everything which precedes it in the tree. And without doing "clever" stuff.
One (of several) denormalisation approaches would be to maintain a depth * width
long list of leaves to ancestors. (Like I said, it's not storage efficient. You'll have to evaluate if that is a problem in a production scenario.) For your example data, it would look like this:
+------------+--------+
| AncestorId | LeafId |
+------------+--------+
| 20081 | 66840 |
| 20081 | 96762 |
| 13851 | 66840 |
| 13851 | 96762 |
| 100904 | 66840 |
| 100904 | 96762 |
| 66840 | 66840 |
| 96762 | 96762 |
+------------+--------+
Thus now you can do something like this:
CREATE TABLE `tree_branches` (
`AncestorId` int(11) NOT NULL,
`LeafId` int(11) NOT NULL
);
INSERT INTO `tree_branches` SET `AncestorId`=20081, `LeafId`=66840;
INSERT INTO `tree_branches` SET `AncestorId`=20081, `LeafId`=96762;
INSERT INTO `tree_branches` SET `AncestorId`=13851, `LeafId`=66840;
INSERT INTO `tree_branches` SET `AncestorId`=13851, `LeafId`=96762;
INSERT INTO `tree_branches` SET `AncestorId`=100904, `LeafId`=66840;
INSERT INTO `tree_branches` SET `AncestorId`=100904, `LeafId`=96762;
INSERT INTO `tree_branches` SET `AncestorId`=66840, `LeafId`=66840;
INSERT INTO `tree_branches` SET `AncestorId`=96762, `LeafId`=96762;
SELECT
GROUP_CONCAT(`breadCrumbCategories`.`CategoryName` SEPARATOR " > ")
FROM `ebay_categories` AS `matchedCategory`
INNER JOIN `tree_branches` AS `matchedCategoryLeaves` ON (`matchedCategoryLeaves`.`AncestorId` = `matchedCategory`.`categoryId`)
INNER JOIN `tree_branches` AS `breadCrumbs` ON (`breadCrumbs`.`LeafId` = `matchedCategoryLeaves`.`LeafId`)
INNER JOIN `ebay_categories` AS `breadCrumbCategories` ON (`breadCrumbCategories`.`CategoryId` = `breadCrumbs`.`ancestorId`)
WHERE
`matchedCategory`.`CategoryName` LIKE "Post%"
GROUP BY
`breadCrumbs`.`LeafId`
;
You should add some sort of sorting for the GROUP_BY
to ensure it doesn't do something implicitly unexpected. You could (for example) maintain a level ID for that purpose.
Update:
Once you have understood what I've done above, you should test it with LIKE 'Ant%'
and observe the erroneous output. Add a second GROUP BY
clause and a DISTINCT
to solve the problem caused by user queries matching multiple crumbs which are ancestors to the same leaf.
SELECT
DISTINCT GROUP_CONCAT(`breadCrumbCategories`.`CategoryName` SEPARATOR " > ")
FROM `ebay_categories` AS `matchedCategory`
INNER JOIN `tree_branches` AS `matchedCategoryLeaves` ON (`matchedCategoryLeaves`.`AncestorId` = `matchedCategory`.`categoryId`)
INNER JOIN `tree_branches` AS `breadCrumbs` ON (`breadCrumbs`.`LeafId` = `matchedCategoryLeaves`.`LeafId`)
INNER JOIN `ebay_categories` AS `breadCrumbCategories` ON (`breadCrumbCategories`.`CategoryId` = `breadCrumbs`.`ancestorId`)
WHERE
`matchedCategory`.`CategoryName` LIKE "An%"
GROUP BY
`breadCrumbs`.`LeafId`,
`matchedCategory`.`CategoryId`
;