Consider an e-commerce application with multiple stores. Each store owner can edit the item catalog of his store.
My current database schema is as follows:
item_names: id | name | description | picture | common(BOOL)
items: id | item_name_id | picture | price | description | picture
item_synonyms: id | item_name_id | name | error(BOOL)
Notes: error
indicates a wrong spelling (eg. "Ericson"). description
and picture
of the item_names
table are "globals" that can optionally be overridden by "local" description
and picture
fields of the items
table (in case the store owner wants to supply a different picture for an item). common
helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza")
I think the bright side of this schema is:
Optimized searching & Handling Synonyms: I can query the item_names
& item_synonyms
tables using name LIKE %QUERY%
and obtain the list of item_name_id
s that need to be joined with the items
table. (Examples of synonyms: "Sony Ericsson", "Sony Ericson", "X10", "X 10")
Autocompletion: Again, a simple query to the item_names
table. I can avoid the usage of DISTINCT
and it minimizes number of variations ("Sony Ericsson Xperia™ X10", "Sony Ericsson - Xperia X10", "Xperia X10, Sony Ericsson")
The down side would be:
Overhead: When inserting an item, I query item_names
to see if this name already exists. If not, I create a new entry. When deleting an item, I count the number of entries with the same name. If this is the only item with that name, I delete the entry from the item_names
table (just to keep things clean; accounts for possible erroneous submissions). And updating is the combination of both.
Weird Item Names: Store owners sometimes use sentences like "Harry Potter 1, 2 Books + CDs + Magic Hat". There's something off about having so much overhead to accommodate cases like this. This would perhaps be the prime reason I'm tempted to go for a schema like this:
items: id | name | picture | price | description | picture
(... with item_names
and item_synonyms
as utility tables that I could query)
- Is there a better schema you would suggested?
- Should item names be normalized for autocomplete? Is this probably what Facebook does for "School", "City" entries?
- Is the first schema or the second better/optimal for search?
Thanks in advance!
References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT
EDIT: In the event of 2 items being entered with similar names, an Admin who sees this simply clicks "Make Synonym" which will convert one of the names into the synonym of the other. I don't require a way to automatically detect if an entered name is the synonym of the other. I'm hoping the autocomplete will take care of 95% of such cases. As the table set increases in size, the need to "Make Synonym" will decrease. Hope that clears the confusion.
UPDATE: To those who would like to know what I went ahead with... I've gone with the second schema but removed the item_names
and item_synonyms
tables in hopes that Solr will provide me with the ability to perform all the remaining tasks I need:
items: id | name | picture | price | description | picture
Thanks everyone for the help!