1

I'm importing a CSV into a database that contains multiple entities: City, State, and Country.

I'd like to have a hierarchy of references using foreign keys across the tables:

  • City.State_Ref -> State.ID
  • State.Country_Ref -> Country.ID

So the City table will have the following schema in the end:

CREATE TABLE "City" (
  "ID" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
  "Name" INTEGER NOT NULL,
  "State_Ref" INTEGER,
  FOREIGN KEY("State_Ref") REFERENCES "State"("ID")
)

Setting up the State and Country tables is easy enough, but when there is the same State name that occurs in different Country records, I'm unable to pick the correct one:

UPDATE City SET State_Ref = (SELECT ID from State WHERE City.State = State.Name);

This will match multiple records and LIMIT just grabs the first one (which isn't always right).

I have also tried JOIN to create a new table with the correct references:

SELECT City.Name as Name, State.ID as State_Ref

FROM City
    INNER JOIN State ON City.State = State.Name
    INNER JOIN Country ON City.Country = Country.Name

ORDER BY City.Name ASC;

But this results in multiple records for all of the State duplicates.

Sample data (the problem happens with "Wonderland" since "Colorado" exists in multiple countries):

Country         State         Name
United States   New York      Kingston  
United States   Texas         Paris 
England         Kent          Kingston  
United States   Maine         Paris 
France          Île-de-France Paris 
United States   Colorado      Denver    
Canada          Colorado      Wonderland

Any pointers for setting State_Ref properly would be much appreciated!

jheddings
  • 26,717
  • 8
  • 52
  • 65
  • I don't think I would bother normalizing the data, unless you have a specific reason -- outside this data -- to reference `country` and `state` entities. – Gordon Linoff Mar 07 '20 at 19:12
  • Yeah... This is a sample for a much larger set of data. The "State" and "Country" equivalents each have additional data associated with them. – jheddings Mar 07 '20 at 20:08

1 Answers1

0

Thanks to this answer, I was able to figure it out... I wasn't filtering for the matching Country in the second JOIN.

SELECT
    City.Name as Name, State.ID as State_Ref

FROM City
    INNER JOIN State ON City.State = State.Name
    INNER JOIN Country ON (State.Country_Ref = Country.ID AND City.Country = Country.Name)

ORDER BY City.Name ASC;
jheddings
  • 26,717
  • 8
  • 52
  • 65