I have a time-series location data table containing the following columns (time, first_name, last_name, loc_lat, loc_long)
with the first three columns as the primary key. The table has more than 1M rows.
I notice that first_name
and last_name
duplicate quite often. There are only 100 combinations in 1M rows. Therefore, to save disk space, I am thinking about creating a separate people
table with columns (id, first_name, last_name)
where (first_name, last_name)
is a unique constraint, in order to simplify the time-series location table to be (time, person_id, loc_lat, loc_long)
where person_id
is a foreign key for the people
table.
I want to first create a new table from my existing 1M row table to test if there is indeed meaningful disk space save with this change. I feel like this task is quite doable but cannot find a concrete way to do so yet. Any suggestions?