I need to connect two tables in a query that I will use to insert data to third table (used in the future to join the two). I will mention only relevant columns in these tables.
PostgreSQL version 9.0.5
Table 1: data_table
migrated data, ca 10k rows, relevant columns:
id (primary key),
address (beginning of an address, string that I need to match with the second table. This address has varying length.)
Table 2: dictionary
dictionary, ca 9 mln rows, relevant columns:
id (primary key),
address (full address, string that I need to match with the first table, varying length as well.)
What exactly do I need
I need to correctly connect these tables in a select statement, and then insert these to a third table. All I need is a way to successfully connect these tables.
The way I want to do it is to take each address from data_table, and join it with first address (edit: order by address asc) from dictionary that begins with data_table.address (without multiplying records, as a lot of addresses in dictionary begin with each data_table.address).
Also, addressess in both tables contain a lot of irregular spaces, so we probably need to
replace(address, ' ', '')
on both of them (any alternative ideas welcome). There might also be some performance issues since dictionary has 9 mln rows and the server is rather slow.
I see the result as some variation of following query:
select
data_table.id, dictionary_id
from
data_table, dictionary
where
-conditions-