0

I'm trying to design a MySQL database to store user zipcode preferences for providing a particular service. For example, user A who is a plumber is willing to travel to x,y, and z zipcodes to provide his service. I've been thinking about the various ways of implementing this, and scalability is very important. Also, I would like to have a mapping between the zip code and the city name(s).

One way of designing this is to create a huge table with each column representing a zip code and each row will store a user entry with the zip code preferences. But as I add zip codes and say millions of users, how will this scale? I don't think it will scale well, but it's simple to implement.

Another way is to make this hierarchial by having a primary table and a bunch of secondary tables. The secondary tables hold lets say zip codes for a county, and the primary table acts as the key to the secondary. I think this would scale better as the tables can be distributed, but I may have minor redundancy because a user can be stored in several tables.

Anyway, I would appreciate any thoughts, ideas, or alternative designs that could help me with this. The question really comes down to, how should I design this and why?

Update: I have a separate table with user info. I'm trying to design table(s) for the user's zip code preferences.

WindsurferOak
  • 4,861
  • 1
  • 31
  • 36
  • 1
    Side issue: there's not necessarily a one-to-one correspondence between ZIP codes and city names. Along the Texas border, there are probably even a few ZIP codes that still map to two *states*. ZIP codes have to do with how mail carriers drive their route, not with geography or political boundaries. – Mike Sherrill 'Cat Recall' May 27 '11 at 10:48
  • Be careful with this method. Example: I'm near a town line. There's a pizza shop less than half a mile from me on the other side of the line. Nobody has any idea how to include me in their delivery zone. – JSWork May 27 '11 at 16:10

5 Answers5

1

I'd advise a slight enhancement to your current approach.

Use a database that supports the indexing of Geospatial data, e.g. PostgreSQL with Postgis. Then, in addition to zip codes, store the coordinates of the zip code.

Thus, when asking your plumber which zip codes he wants to serve, you'll be able to pull out the nearby zip codes. Likewise, you'll be able to pull out plumbers in the nearby areas when users are querying your database.

Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • MySQL does allow spatial indexes. But only on MyISAM tables. – Johan May 27 '11 at 15:23
  • MyISAM kinds of rules it out, doesn't it? :-) – Denis de Bernardy May 27 '11 at 15:25
  • 1
    Not really, **if** you don't need transactions on **that** table. MyISAM is just fine, remember you can still use a transactional engine on all your other tables. And the OP's table req. looks very static, it's just a lookup table to avoid repeated calculations. – Johan May 27 '11 at 15:46
  • I dunno. The only thing I've ever gotten with MyISAM were corrupt tables, myself. But I assume many others have had better milage. – Denis de Bernardy May 27 '11 at 16:01
1

You must define entities regardless of the scalability concern, you dont want to make a database design mistake in the first place. I guess you could have two table such as User and ZipCodes, and a table that would link user preferences with zip codes, such as UserZipCodes which will have one preferred zip code, or more for a user, depending on your requirements, (probably enforcing it with a unique constraint). I dont know for MySQL, but in SQL server reading such tables, with few columns, is not a performance issue, so you better test it beforehand.

Mentor
  • 485
  • 1
  • 5
  • 18
1

As tradesmen do not usually travel hundreds of miles to fix a leaky tap you could approach the problem as follows:

I'd simply just create a zip_code_distances table and pre-compute the distances between all 42K zipcodes in the US which are within a 20-25 mile radius of each other.... Only including zipcodes within a 20-25 miles radius of each other reduces the number of rows you need to store in the distance table from it's maximum of 1.7 billion (42K ^ 2) - 42K to a much more manageable 4 million or so...

See my full answer here:

Calculate distance between zip codes and users

Other tables you'd include would be: city, city_to_zipcode, etc...

Hope it helps :)

Community
  • 1
  • 1
Jon Black
  • 16,223
  • 5
  • 43
  • 42
0

I would use postgresql - very scalable. It's also very feature-rich. As for the schema consider splitting the table into three tables: 1. table of zipcodes 2. table of user data 3. cross-table linking zipcodes and data

Don't do it in one table!

MarkP
  • 171
  • 1
  • 5
0

As suggested by others... Three entities: User, Zip-code and a cross-ref between the two. Business rules would be... A User can service many Zipcodes. A zipcode can be serviced by many Users.

The number of rows may seem large, but really this is not a lot for a modern DBMS to handle and there are ways to get very good performance out of tables with many millions of rows. eg. Horizontal-Partitioning.

And a nice-to-have would be to store the Geospatial data to help the user select nearby zipcodes as suggested by @Denis

user212102
  • 91
  • 3