1

I have table called location that looks like this:


|  zip_code  | city          | state |
|  --------  | ------------- | ----- | 
| 01108-2005 |               |       |
| 02130-1911 | Jamaica Plain | MA    |
| 02130-1911 | Jamaica Plain |       |
| 02138-1557 | Cambridge     | MA    |
| 02138-1557 |               | MA    |
| 02138-1557 |               |       |
| 02370-2509 |               | MA    |
| 02370-2509 |               | MA    |

I only want unique zip codes and as many corresponding city/state values as possible. For example:

  • the zip code 01108-2005 is already unique, so I will keep it even though there is no city or state.
  • the zip code 02138-1557 at least has state in one record, but not in the other, so I want to only keep the record that has a state.
  • the zip code 02130-1911 has a record with only city as well as a record with both city and state. I want the record with both city and state, for it is the most complete.

I've tried the following code:

SELECT DISTINCT zip_code, city, state
           FROM location
           ORDER BY zip_code, city DESC, state DESC

which treats each record as distinct. Is there a way to only select the most complete records, or to merge all records that share a zip code?

I am new to StackOverflow (and to MySQL) so any tips on how I can make my question easier to understand are appreciated.

nickjf6
  • 69
  • 4
  • 1
    and what do you want if multiple records have the same zip code but different city and/or states? Which would you want? for example, the zip code 94608 in California is used for both Emeryville and parts of Oakland. IMO the question is clear; just not how to handle edge cases. – xQbert Dec 03 '21 at 16:43
  • google results [https://www.google.com/search?q=94608+includes+what+cities%3F&rlz=1C1GCEA_enUS786US786&ei=E1KqYYfbNIXYytMPjea8-AM&ved=0ahUKEwjHsJ6Xksj0AhUFrHIEHQ0zDz8Q4dUDCA4&uact=5&oq=94608+includes+what+cities%3F&gs_lcp=Cgdnd3Mtd2l6EAMyBQghEKABMgUIIRCgATIFCCEQoAE6BwgAELADEEM6BQgAEIAEOggIABCABBDJAzoJCAAQyQMQFhAeOgYIABAWEB46BQgAEM0COgUIIRCrAjoICCEQFhAdEB46BwghEAoQoAFKBAhBGAFKBAhGGABQ9wVY1C1goS9oBXAAeACAAZkBiAGSGJIBBDMuMjOYAQCgAQHIAQrAAQE&sclient=gws-wiz] As you can see Problem: – xQbert Dec 03 '21 at 17:22
  • That is a good point, I did not consider that case. I was under the impression that the additional four digits from USPS would make the ZIP unique. Maybe it would make sense to use a surrogate key, rather than using the ZIP as the primary key. – nickjf6 Dec 04 '21 at 15:42
  • One site says: "The last four digits are not specific to a city but to a street address or Post Office Box." USPS Site says: "[123] : Sectional Center or Large City [45] : Post Office facility or Delivery Area [ - ] : The required "dash" or "hyphen" separates the first five digits from the last four digits; the +4 [67] : Sector or Several Blocks [89] : Segment or One Side of a Street" So I don't think I could assume a zip code associates to a single city even with the + 4. – xQbert Dec 06 '21 at 13:46
  • I think your of zip code is fine as PK: but not with City/State in the same table; since it could exist multiple times with different cities. The data needs to be denormalized; but I don't fully understand your use case. For the situation presented however, We just need to know rules of how do we determine which city/state to pick when multiple exist. Pick one alphabetic 1st, last, or pick all or... something else... – xQbert Dec 06 '21 at 13:49

2 Answers2

2

Using this concept: MySQL Orderby a number, Nulls last

Without test data: Simple query for location

SELECT zip_Code, City, State 
FROM (SELECT zip_Code, city, state, row_number() over (partition by zip_Code order by isNull(city), city ASC, isnull(state), state ASC) RN
     FROM LOCATION) subq
WHERE RN=1

With Test Data: TESTED:

 SELECT zip_Code, City, State 
 FROM (SELECT zip_Code, city, state, row_number() over (partition by zip_Code order by isNull(city), city ASC, isnull(state), state ASC) RN
       FROM (SELECT '02130-1911' zip_Code, 'Jamaica Plain' city, 'MA' state  UNION ALL
             SELECT '02130-1911','Jamaica Plains', 'MA' UNION ALL
             SELECT '02130-1911', 'aAKLAND'      , 'MA' UNION ALL 
             SELECT '01108-2005', NULL           , null UNION ALL 
             SELECT '02130-1911', 'Jamaica Plain', NULL UNION ALL
             SELECT '02138-1557', 'Cambridge'    , 'MA' UNION ALL
             SELECT '02138-1557', NULL           , 'MA' UNION ALL
             SELECT '02138-1557', NULL           , NULL UNION ALL  
             SELECT '02370-2509', NULL           , 'MA' UNION ALL 
             SELECT '02370-2509', NULL           , 'MA' ) LOCATION) subQ
 WHERE RN=1

Example: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a03800d0e61713ee61b4c550d395c096

Giving us below and handling edge case I believe This is giving us the city/state alphabetically first when multiple exist. We could check to see if a city/state is populated assign a 1 add both together. Use result to assign a rank/row and return all such occurrences if multiple are "as complete". Like I said Edge cases and business rules... :P

+------------+-----------+-------+
|  zip_Code  |   City    | State |
+------------+-----------+-------+
| 01108-2005 |           |       |
| 02130-1911 | aAKLAND   | MA    | <-- My edge case
| 02138-1557 | Cambridge | MA    |
| 02370-2509 |           | MA    |
+------------+-----------+-------+
xQbert
  • 34,733
  • 2
  • 41
  • 62
0

Check for not null in state

SELECT DISTINCT zip_code, city, state
FROM location
WHERE state is not null
 ORDER BY zip_code, city DESC, state DESC
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107