0

I have a SQL table consisting of names, addresses and some associated numerical data paired with a code. The table is structured such that each number-code pair has its own row with redundant address info. abbreviated version below, let's call it tblPeopleData

Name            Address                       ArbitraryCode    ArbitraryData
----------------------------------------------------------------------------
John Adams      45 Main St, Rochester NY            a                111
John Adams      45 Main St, Rochester NY            a                231
John Adams      45 Main St, Rochester NY            a                123
John Adams      45 Main St, Rochester NY            b                111
John Adams      45 Main St, Rochester NY            c                111
John Adams      45 Main St, Rochester NY            d                123
John Adams      45 Main St, Rochester NY            d                124
Jane McArthur   12 1st Ave, Chicago IL              a                111
Jane McArthur   12 1st Ave, Chicago IL              a                231
Jane McArthur   12 1st Ave, Chicago IL              a                123
Jane McArthur   12 1st Ave, Chicago IL              b                111
Jane McArthur   12 1st Ave, Chicago IL              c                111
Jane McArthur   12 1st Ave, Chicago IL              e                123
Jane McArthur   12 1st Ave, Chicago IL              e                124

My problem is that this table is absolutely massive (~10 million rows) and I'm trying to split it up to make traversal less staggeringly sluggish.

What I've done so far is to make a table of just addresses, using something like:

SELECT DISTINCT Address FROM tblPeopleData (etc.)

Leaving me with:

Name            Address                       
------------------------------------------
John Adams      45 Main St, Rochester NY      
Jane McArthur   12 1st Ave, Chicago IL           

...just a list of addresses. I want to be able to look up each address and see which names reside at that address, so I assigned each address a UniqueID, such that now I have (this table is around ~500,000 rows in my dataset):

Name            Address                       AddressID
--------------------------------------------------------
John Adams      45 Main St, Rochester NY      000001
Jane McArthur   12 1st Ave, Chicago IL        000002

In order to be able to look up people by address though, I need this AddressID field added to tblPeopleData, such that each address in tblPeopleData is associated with its AddressID and this is added to every row, such that I would have:

Name            Address                       ArbitraryCode    ArbitraryData    AddressID
----------------------------------------------------------------------------------------
John Adams      45 Main St, Rochester NY            a                111          00001
John Adams      45 Main St, Rochester NY            a                231          00001
John Adams      45 Main St, Rochester NY            a                123          00001
John Adams      45 Main St, Rochester NY            b                111          00001
John Adams      45 Main St, Rochester NY            c                111          00001
John Adams      45 Main St, Rochester NY            d                123          00001
John Adams      45 Main St, Rochester NY            d                124          00001
Jane McArthur   12 1st Ave, Chicago IL              a                111          00002
Jane McArthur   12 1st Ave, Chicago IL              a                231          00002
Jane McArthur   12 1st Ave, Chicago IL              a                123          00002
Jane McArthur   12 1st Ave, Chicago IL              b                111          00002
Jane McArthur   12 1st Ave, Chicago IL              c                111          00002
Jane McArthur   12 1st Ave, Chicago IL              e                123          00002
Jane McArthur   12 1st Ave, Chicago IL              e                124          00002

How do I make this jump from having UniqueIDs for AddressID in my unique addresses table, to adding these all to each row with a corresponding address back in my tbPeopleData?

UpQuark
  • 791
  • 1
  • 11
  • 35
  • You need three tables here. One for Name, another for address and a third for the arbitrary data. The name table would have a foreign key to the address table and the arbitrary data table would have a foreign key to the name/people table. – Sean Lange Jul 20 '15 at 16:15
  • I could try this. There is actually already a pre-existing unique ID that binds ArbitraryData and Person, so I could split it up without much trouble. – UpQuark Jul 20 '15 at 17:26

2 Answers2

1

Just backfill the calculated AddressID back to tblPeopleData - you can combine an UPDATE with a FROM (like you would do in a select)

UPDATE tblPeopleData 
SET AddressID = a.AddressID
FROM tblPeopleData pd
INNER JOIN tblAddressData a
 ON pd.Address = a.Address
Jamiec
  • 133,658
  • 13
  • 134
  • 193
  • Thanks, this worked for me perfectly. Additionally, since I was dealing with ~10 million rows and the query took a long time, [this answer](http://stackoverflow.com/a/4754632/962986) was incredibly helpful for tracking progress. – UpQuark Jul 30 '15 at 14:43
1

You would alter the table to have the address id:

alter table tblPeopleData add AddressId int references Address(AddressId);

Then you can update the value using a JOIN:

update tblPeopleData pd JOIN
       Address a
       ON pd.Address = a.Address
    pd.AddressId = a.AddressId;

You will definitely want an index on Address(Address) for this.

Then, you can drop the old column:

alter table drop column Address;

Note:

It might be faster to save the results in a temporary table, because the update is going to generate lots and lots of log records. For this, truncate the original table, and re-load the data:

SELECT . . . , a.AddressId
INTO tmp_tblPeopleData
FROM tblPeopleData pd JOIN
     Address a
     ON pd.Address = a.Address;

TRUNCATE TABLE tblPeopleData;

INSERT INTO tblPeopleData( . . .)
    SELECT . . .
    FROM tmp_tblPeopleData;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786