12

I am designing a database for a real estate application. It is proving to be more involved than I had anticipated (maybe I am overcomplicating things).

The problems essentially are due to the presence of:

  • synonyms For example the terms: flat, apartment and penthouse may all refer to essentially, the same type of property
  • attributes (different property types have different attributes) For example an apartment could be a ground floor or top floor etc

I have ended up with a rather (unintentionally) elaborate classification tree for different property types. The tree nodes are the actual instances of property types.

I want to create a database so that I can query using not only any of the synonyms, but also the attributes.

So for example, the query (in pseudo SQL):

SELECT * from properties where synonym="flat" and attribute IN ('ground floor', 'garden');

should return a list of apartments|flats that are either ground floor AND have a garden.

Can someone help me with how to design the database schema so as to allow the kind of querying described above?

Last but not the least, I will be using either MySQl or PostgreSQL as the backend database, but would prefer the approach to be db agnostic - if possible.

Noah
  • 15,080
  • 13
  • 104
  • 148
oompahloompah
  • 129
  • 1
  • 1
  • 4

2 Answers2

35

I would take a different approach to your attribution scheme. Rather than treating different attributions as synonyms, I would treat them as overlapping, or more specifically, nested descriptions of a property. This would handle your business case while at the same time acknowledging the astute observation made by Mike Sherrill.

Here is a quick ERD sketch:

ERD

By way of a very fast data dictionary:

PROPERTY is a piece of real estate.

CATEGORY is a collection of descriptive attributes. The point of this table is more as an organizer of attributes than anything else. It could include things like "type of property", "ownership structure", "number of bathrooms", and whatever else might be of interest.

ATTRIBUTE is a specific quality of interest. Note the involuted relationship on this entity type. I'll deal more with that later. The main point is that attributes can be more general or more specific and some attributes can be seen as refinements of other attributes.

DESCRIPTOR is the intersection of a PROPERTY and the ATTRIBUTEs that have been associated with that particular piece of real estate.

So how is this supposed to help?

The key is how attributes work. If you use a nested set model, then you can address more or less specific attribution and searching criteria. Consider the following diagram of one potential CATEGORY with its associated ATTRIBUTEs:

enter image description here

In this example the CATEGORY is "type of property". You can see from the diagram that there is a hierarchical breakdown of attributes in this category. Each box in the diagram is a record in ATTRIBUTE. Boxes that contain other boxes have child attributes. Boxes that are inside another box have an FK to their containing box and so forth.

In this way, you could say "I want to find a property that is a Penthouse". You can then find PROPERTY records with a related DESCRIPTOR that points at the "Penthouse" ATTRIBUTE. That is pretty easy. But what if your search comes up empty?

The advantage of this approach is that you can then loosen your criteria by saying, "let's go up the attribution hierarchy to the next less-specific thing than penthouse". In my example, that would be "Highrise". Now you try your search again and you might have better luck.

A system like this gives you the ability to be as specific as you want in each category of attribution while relaxing the others far enough to start getting search hits. This is really what a real estate agent's work is about is it not? Helping the client to make the necessary compromises to find the best fit to their most important criteria?

Handling Nested Sets

The only tricky part of this approach is how to handle the nested sets. There are lots of ways to do this, many of which have been thoroughly documented elsewhere. I myself like the visitation number technique, especially for relatively static data sets. This makes it very easy to find matches for some given ATTRIBUTE or any of its children without having to do anything exotic in your SQL.

EDIT: So How Does This Work?

OP asked how do you handle things like number of bedrooms and what do the queries look like? Let's take another example for illustration:

Bedroom Example

The above shows the nested sets for the CATEGORY "Number of Bedrooms". I've also added the visitation numbers to the diagram. Note the way the visitation numbers work, in particular, note that the left (green) and right (red) numbers for any given attribute value contain the left and right visitation numbers for any subordinate attributes. For example, "2+ Bedrooms" has left and right numbers 6 and 15 respectively. Every attribute that falls under "2+ Bedrooms" has left and right numbers that fall within this range.

So how would you query for a properties with a given descriptor? Let's say we want to find all properties with two or more bedrooms. The SQL for such a query might look something like this:

select P.* 
from PROPERTY P
  inner join DESCRIPTOR D
    on P.id = D.property_id
  inner join ATTRIBUTE A
    on D.attribute_id = A.id
where A.left >= (select X.left from ATTRIBUTE X
                 where X.name = '2+ Bedrooms')
  and A.right <= (select Y.right from ATTRIBUTE Y
                  where Y.name = '2+ Bedrooms')

Note that the above query is a little different that what you might actually use. For example, you'd probably look up the filtering attribute using its int identity key instead of its string name. However, I thought I'd leave it as shown for clarity around the main point, which is you filter by looking not for a specific related attribute, but for any related attributes that fall within your filter range.

If you wanted to filter on multiple attributes, then just add more sub-clauses to your where clause.

Joel Brown
  • 14,123
  • 4
  • 52
  • 64
  • I think you are on the right path there (+1 for the nice diagrams too) - but I somehow, still don't "get it" in terms of how I could actually put this into practice - i.e. 1. Design a schema 2. Run a sample query against this schema. I am particularly struggling with how the category can also be used for something like number of rooms (as well as property type), and how I can search for say, a 2 bedroom penthouses in a given post code? – oompahloompah Sep 15 '11 at 12:40
  • And if the penthouse query returned no matches, what kind of SQL statement would I issue to tell the DB to fetch records for the next level up? Sorry to sound dumb, but I want to make sure I get this right first time. – oompahloompah Sep 15 '11 at 12:44
  • Please see my edit regarding how to handle things like number of bedrooms and what the SQL looks like. As to how to back out to a less restrictive search, what you would do is look up the original filtering attribute, say "Penthouse" and use it's FK to `ATTRIBUTE` to find the next wider attrribute value, which in this case is "Highrise", then you repeat the query with "Highrise" and hope you find some hits. – Joel Brown Sep 15 '11 at 13:19
  • 1
    thanks for the detailed response. I am convinced that your answer is the correct one, but I am having to read up on a lot of additional matter to make sure I fully understand what you wrote (and more importantly), that I can implement it into a working solution. So it may be a little while before I accept you answer (I may still have to ask you some more "dumb" questions) - I hope you don't mind. – oompahloompah Sep 15 '11 at 20:41
  • @JoelBrown It's a really awesome approach, the only thing that disturbs me is that `CATEGORIES` are semantically detached: you're able to set category **Property Type ... Land** and at the same time **Number of Bedrooms ... 5+**. . As far as I understand it requires you to query for both CATEGORY (I've mentioned) at once? If it does then we should show all available search options to user (dropdowns with categories for every matter)? – lexeme Mar 20 '17 at 12:24
  • @lexeme The example is limited by the basic scope of the original question. The model handles multiple, concurrent criteria types in a query. You could extend the example by adding multiple sets of joins and extra terms in the where clause to cover multiple categories. If you wanted to have one dropdown per category, for example, you could easily display that and it wouldn't be especially hard to extend the query to either AND or OR those dropdown selections in your query. – Joel Brown Mar 20 '17 at 17:45
2

To handle synonyms you could have many-to-many lookup between a table containing the static list of your property types, and a table containing the synonym. This way one synonym could be mapped to more than one property type.

For example:

Table:Property Type
1 House
2 Appartment
3 Large House
4 Cave

Table:Synonym
1 house
2 flat
3 dwelling
4 condo
5 mansion

Table:PropertyType-Synonym
1 1 (House is a house
1 3 (House is a dwelling)
2 2 (Appartment is a flat)
2 3 (Appartment is a dwelling)
2 4 (Appartment is a condo)
3 1 (Large House is a house)
3 3 (Large House is a dwelling)
3 5 (Large House is a mansion)
4 3 (Cave is a dwelling)

For properties, you could utilize a kind of open attribute structure.

For example:

Table:Property
1 Apartment F, Field House Gardens
2 123 Alphabet Street, NumberTown

Table:Attribute
1 Is ground floor?
2 Number of bedrooms
3 Has garden?

Table:Property-Attribute-Values
1 1 No
1 2 2
1 3 Yes
2 2 5 
2 3 Yes

Hope this helps

tom redfern
  • 30,562
  • 14
  • 91
  • 126