0

I have an object called listings, and a listing can have a number of amenities (compare this to post, tags model). It's a sort of has/doesn't have relationship. So I have an amenities table which stores what amenities there can be in the system, there is the listings table that stores listings, listing_amenities which stores amenities.

The problem is searching listings on the basis of amenities

I can search listing on the basis of any of the parameters of listing(rent, location, etc), but not amenities since that requires a 2 table join. I haven't really been able to come up with an efficient solution to do this. So, I'd like a little help on how to search on this basis.

One solution I could think of was to inject an extra field with group concat of amenities

 SELECT * 
 FROM(SELECT listings.*, GROUP_CONCAT(CONCAT("(",Concat(amenity_id,")"))) as amenities 
      FROM `listings`,`listing_amenities`  
      WHERE listings.id=listing_amenities.listing_id 
      GROUP BY listings.id) as rs  
 WHERE amenities like "%(1)%"

i.e concatenate and add amenity ids wrapped in () and search on them

which gets me to thinking: amenities are only used in context of listing and for searching so why not store them in a similar format (1), (2) to indicate available amenities in a seperate column in the listings table and save on a table and extra query cost? What are the disadvantages of this approach, if any?

For an easier understanding of table

 Listing Table
 --------
 id title description rent

 Amenities
 -------------
 id name description

 Listing_Amenities
 ------------------
 id listing_id amenity_id

I am thinking I could delete the third table

and modify listing

 Listing Table
 --------
 id title description rent amenties(the amenities available in the appartment wrapped in bracket)

like

 1 "House in Downtown Discworld" "Nice house, running water, teleporter" 2000 "(1)(5)(7)"

This kind of allows me to search by amenities without needing a join. I can run the same query where I select rent

and search with the where clause: where amenities like "%(1)%" AND amenities like "%(2)%"

The only problem I see in this approach is a deletion anomaly i.e if I delete an amenity from system, it won't delete amenity from listings automatically

but I see this as an unlikely occurence since amenities are decided by the admin and will be pretty much frozen before site launch and secondly, while we might add new amenties, its unlikely we'll delete any.

Are there any other disadvantages. Can you please highlight specific scenarios. I see this as more efficient than a join. And frankly easier to wrap my head around. I could be wrong

SoWhat
  • 5,564
  • 2
  • 28
  • 59
  • What's wrong with joining the two tables? We know that string comparison is less efficient than integer comparison (ID), but it might not be a significant performance hit in your situation. – Hart CO Jun 09 '13 at 04:42
  • So `amenities_listings` is a subset of `amenities? – RandomSeed Jun 09 '13 at 10:01
  • I updated the question. Its `listing_amenities`, not the other way round – SoWhat Jun 09 '13 at 16:46
  • 1
    No, no, no, do NOT do that. Keep the 3rd table. – ypercubeᵀᴹ Jun 10 '13 at 08:37
  • 1
    Please read this: **[Is using multiple foreign keys separated by commas wrong, and if so, why?](http://dba.stackexchange.com/questions/17808/is-using-multiple-foreign-keys-separated-by-commas-wrong-and-if-so-why/17810#17810)** – ypercubeᵀᴹ Jun 10 '13 at 08:41
  • 1
    And this: **[Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad)** – ypercubeᵀᴹ Jun 10 '13 at 08:43

3 Answers3

4

Unable to grasp the table model from your post... But in general I think you should use many to many relationship like this. A seperate table, just to hold the relationships.

ListingsAmenities
----------
ListingID
AmenitiesID    

The Amenities table should hold each amenities only once. Then you need not have to concatenate.

With this you should be able to search listings based on amenities Id too..

When you store the relationship in a column in tables, you can store only one Listing per amenity... or you should start concatenating... Instead if you store like this, the same listing can have many rows with various amenities, which makes querying easy.

Like this.

ListingAmenitiesTable:

ListingID   AmenityID
---------   ----------
L1      A1
L1      A2
L2      A2
L2      A3      
The King
  • 4,600
  • 3
  • 39
  • 58
  • Good point, missed that in the question that there doesn't appear to be an associative entity. – Hart CO Jun 09 '13 at 04:51
  • there is an associative entry exactly like this. I am just wondering if i should store the association in column i.e instead of having a seperate table listing the association, I simply store the amenity ids in a column in the listing's row itself. – SoWhat Jun 09 '13 at 16:36
  • I see that it avoids a join and allows me to search by amenity id quite easily in the same query where rent and all is specified. I am trying to debate the advantages of an association table to storing the association (amenity id) in a column in the table – SoWhat Jun 09 '13 at 16:37
1

In a transactional (as opposed to reporting) system, it is usually a bad idea to pre-optimize for performance. Build your model based on third normal form (3NF) and then test your system to see if it has unacceptable performance. If it does, carefully examine you indexes and query plans to see if there is a way to optimize performance. As a last resort, denormalize your data.

The 0NF design you are proposing is less flexible and less powerful and will almost certainly perform worse than a normalized design at scale. It is a mistake to assume that joins are inefficient. RDBMS are built specifically to be good at joins.

You can include your desired amenities in your query quite easily using WHERE EXISTS syntax.

SELECT L.*
FROM listings L
WHERE L.rent < 1000
  AND L.location = 'desirable-location-x'
  AND EXISTS (SELECT A.id 
              FROM amenities A
                inner join listing_amenities LA
                  on A.id = LA.amenities_id
              WHERE A.name like '%walk-in%'
                AND LA.listing_id = L.id)

Note the correlated sub-query using EXISTS allows you to find what you are looking for without having to violate any normal forms. If you have proper indexes on your intersection table and on your listings and amenities tables then performance will be good.

Joel Brown
  • 14,123
  • 4
  • 52
  • 64
  • Thanks for the answer, but I am trying to understand what problems appear if I actually store amenity-listing association in a column on the table instead of a seperate table. That's all. I can write a query alright, that's not the issue at all – SoWhat Jun 10 '13 at 03:22
  • @SomeshMukherjee - The question is what problem do you think you are solving by denormalizing to 0NF? This practice is generally frowned upon because it makes data maintenance harder and queries much less efficient. By using string searching every query becomes a table scan so your indexes are useless. Also, how would you print a listing sorted or grouped by amenity? With your 0NF design you can (inefficiently) select out certain records, but you can't do anything else that a relational database is meant to do efficiently. – Joel Brown Jun 10 '13 at 11:37
1

Traditional database design recommends the model you describe, with the "many to many" relationship extracted into a separate table.

There are many reasons to do this - not just the "deletion" anomaly, but also performance - parsing a varchar column to extract the pattern "(*)" is significantly slower than a join between integers. Imagine the query you'd need to find all listings with "view of downtown Ankh Morpork" and "walking distance to Unseen University", but without "direct access to Fool's Guild".

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52