Two tables:
genes
: a list of gene names (with an additional column showing the approved gene name, for direct comparison). We will update this table.
gene_synonyms
: a reference / loopk-up table of approved gene names, and their synonyms (aliases). We will use the data in this table to update the other table.
PSQL:
# SELECT * FROM genes ORDER BY id;
id name approved_name
-- ------- -------------
1 AIBP NAXE
2 ABG A1BG
3 CHP CHP1
4 CHP1 CHP1
5 SLCA1BP CHP1
6 NAXE NAXE
7 AIBP NAXE
8 APOA1BP NAXE
9 A1B A1BG
# SELECT * FROM gene_synonyms;
id approved_name synonym
-- ------------- -------
4 A1BG A1B
5 A1BG ABG
6 CHP1 CHP
7 CHP1 SLCA1BP
8 NAXE AIBP
9 NAXE APOA1BP
Update gene name in genes
table per approved name and synonyms in gene_synonyms
table:
# UPDATE genes
SET name = b.name
FROM gene_synonyms as b
WHERE NOT genes.name = b.name
AND genes.name = b.synonym;
UPDATE 7
# SELECT * from genes order by id;
id name approved_name
-- ---- -------------
1 NAXE NAXE
2 A1BG A1BG
3 CHP1 CHP1
4 CHP1 CHP1
5 CHP1 CHP1
6 NAXE NAXE
7 NAXE NAXE
8 NAXE NAXE
9 A1BG A1BG
# SELECT * from gene_synonyms;
id approved_name synonym
-- ------------- -------
4 A1BG A1B
5 A1BG ABG
6 CHP1 CHP
7 CHP1 SLCA1BP
8 NAXE AIBP
9 NAXE APOA1BP
This is based on @scott-bailey 's answer at SO 2763817; however, that answer is a better fit for this question (in my opinion).