I have a names.dmp file which contains taxonomy ids and scientific names among other details.
I want to fetch the scientific name of a particular tax-id, for which I am running this command:
cat names.dmp | grep "scientific name" | awk '$1~/^10090$/{print $0}' | cut -d "|" -f1,2
which gives me the output:
10090 | Mus musculus
But I need this to be dynamic, i.e., set a variable id=10090
and use this variable inside the regular expression. I need an exact match of the value while using "id", as there are entries such as 210090 and 100904 which I am getting as output which are not needed.
I am quite inexperienced when it comes to awk, so any help is appreciated.
EDIT:
Here is the example input:
10089 | Mus formosanus Kuroda, 1925 | | authority |
10089 | Mus formosanus | | synonym |
10089 | ricefield mouse | | common name |
10089 | Ryukyu mouse | | genbank common name |
10090 | house mouse | | genbank common name |
10090 | LK3 transgenic mice | | includes |
10090 | mouse | mouse <Mus musculus> | common name |
10090 | Mus musculus Linnaeus, 1758 | | authority |
10090 | Mus musculus | | scientific name |
10090 | Mus sp. 129SV | | includes |
10090 | nude mice | | includes |
10090 | transgenic mice | | includes |
10091 | Mus castaneus | | synonym |
10091 | Mus musculus castaneus | | scientific name |
10091 | Mus musculus castaneus Waterhouse, 1843 | | authority |
10091 | southeastern Asian house mouse | | genbank common name |
10092 | Mus domesticus | | synonym |
10092 | Mus musculus domesticus Schwarz & Scharz 1943 | | authority |
10092 | Mus musculus domesticus | | scientific name |
10092 | Mus musculus praetextus | | synonym |
100902 | Fusarium oxysporum f. sp. conglutinans | | scientific name |
100903 | Fusarium oxysporum f. sp. fragariae | | scientific name |
100905 | Cloning vector pACN | | scientific name |
100906 | Nitrosomonas sp. ENI-11 | | scientific name |
100907 | Chilean sea bass | | common name |
And the output I need is:
10090 | Mus musculus