I have a db table persons in which only persons details are captured. Say Name, Father Name, Email, DOB, Proof of Address, Proof of Identity ,Pincode etc. and I have an address table in which address of the persons are stored say Address, Pincode, City, Province. Persons and Address have a foreign key relationship with column addressId(persons) and Id(address).
NOTE: Using Postgres
Now what I have to do say I am given with some proof of address id and I have to find how much such persons with same POA or POI are there and with different id if somewhere Name, Father Name and Address (all three matches).
Contacts Schema
Address Schema
ID | Address | Pincode | City | Province |
---|---|---|---|---|
ad_1 | mini market 850 vijay street | 110056 | XYZ | ABC |
ad_2 | 45 street south coast near Bank | 285006 | MNO | DEF |
ad_3 | 45 street south coast near Bank | 285006 | MNO | DEF |
E.g Input POA Id = poi_1
considering input poa id, we can do direct string matching query on DB.
select * from persons where poi_id = 'poi_1' or poa_id = 'poi_1';
The result is person records with Id 1,2. Now if you observe person 2 has same poi/poa id but address is different. Now we have to consider this address and do dedupe based on this address based on which person with id 3 comes in response. (Name,Fathername and address all 3 must match)
I need an optimised way to resolve the same. If I do like query on name and father name only, It will do a full db scan which is not feasible at all. Same goes for address like query.
My thought:
select * from addressTable where address like = %input address% and pincode = '285006';
I haven't thought how should i proceed bcz everytime I stuck on what basis I should do a query so that it is optimised.
**Note:**
indexes mentioned below
"person_new_dob_idx" btree (dob)
"person_new_email_idx" btree (email)
"person_fst_name_idx" btree (name)
"person_poa_idx" btree (poa_id)
"person_new_poi_idx" btree (poi_id)
"person_addr_id_idx" btree (addr_id)
"person_dob_primary_pincode_idx" btree (dob, primary_pincode)
Postgres version:-- 13.4
Points to consider:
- In the mentioned example pincode is different, but there might be possibility of different address within same pincode.
- Do care for false positives and true negatives.
- Address words might be shuffled or written same address in different way.