1

I have a table like this.

table_A.tsv

N20 N21 N23 N30 N31 N32
8fa80ed445a221fb64b504e3e605e6c8 48 52 63 58 58 73
9fdfa8e132a820fdbc126575f0d1daa9 29 24 28 29 29 24
dece101df1f2e1a45401ed3c948c635e 23 34 21 24 22 24
80bd783b1e56858db62a78ff5c264e9b 26 27 17 31 24 28
a138bd5162bef8329e50dc55a22b2da2 24 24 15 13 19 17
7db3477cd30c755822b947dfbaea0248 13 15 14 14 23 15

Another table like this.

table_B.tsv

ID Taxon
9fdfa8e132a820fdbc126575f0d1daa9 k__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Chitinophagales; f__Chitinophagaceae; g__Ferruginibacter; s__midas_s_11
8fa80ed445a221fb64b504e3e605e6c8 k__Bacteria; p__Nitrospirota; c__Nitrospiria; o__Nitrospirales; f__Nitrospiraceae; g__Nitrospira; s__Nitrospira_defluvii
dece101df1f2e1a45401ed3c948c635e k__Bacteria; p__Nitrospirota; c__Nitrospiria; o__Nitrospirales; f__Nitrospiraceae; g__Nitrospira; s__Nitrospira_defluvii
80bd783b1e56858db62a78ff5c264e9b k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Burkholderiales; f__Nitrosomonadaceae; g__Nitrosomonas
a138bd5162bef8329e50dc55a22b2da2 k__Bacteria; p__Chloroflexi; c__Anaerolineae; o__Anaerolineales; f__Anaerolineaceae; g__UTCFX1; s__midas_s_12690
7db3477cd30c755822b947dfbaea0248 k__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Cytophagales; f__Microscillaceae; g__OLB12; s__midas_s_21673
6d52b222ce41a2076ab37ac20bbfbc8e k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Xanthomonadales; f__Rhodanobacteraceae; g__Dokdonella
171b7079811de53e028943c19d9c2c04 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Burkholderiales; f__Nitrosomonadaceae; g__Nitrosomonas
bbd1adfb8a1be5131917bf09aab1979e k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Burkholderiales; f__Nitrosomonadaceae; g__Nitrosomonas

I need to produce a table like this.

N20 N21 N23 N30 N31 N32
8fa80ed445a221fb64b504e3e605e6c8 k__Bacteria;p__Nitrospirota;c__Nitrospiria;o__Nitrospirales;f__Nitrospiraceae;g__Nitrospira;s__Nitrospira_defluvii 48 52 63 58 58 73
9fdfa8e132a820fdbc126575f0d1daa9 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Chitinophagales;f__Chitinophagaceae;g__Ferruginibacter;s__midas_s_11 29 24 28 29 29 24
dece101df1f2e1a45401ed3c948c635e k__Bacteria;p__Nitrospirota;c__Nitrospiria;o__Nitrospirales;f__Nitrospiraceae;g__Nitrospira;s__Nitrospira_defluvii 23 34 21 24 22 24
80bd783b1e56858db62a78ff5c264e9b k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Nitrosomonadaceae;g__Nitrosomonas 26 27 17 31 24 28
a138bd5162bef8329e50dc55a22b2da2 k__Bacteria;p__Chloroflexi;c__Anaerolineae;o__Anaerolineales;f__Anaerolineaceae;g__UTCFX1;s__midas_s_12690 24 24 15 13 19 17
7db3477cd30c755822b947dfbaea0248 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Microscillaceae;g__OLB12;s__midas_s_21673 13 15 14 14 23 15

I have used this code but it's proven unsuccessful.

awk 'BEGIN {FS=OFS="\t"} NR > 1 {print $1, $2}' table_B.tsv | sed 's/ //g' | while IFS=$'\t' read -a arr; do sed "s/${arr[0]}/${arr[1]}/" table_A.tsv; done
Tiendu
  • 77
  • 7
  • 3
    if you're going to use `awk` then there's rarely a need to add `sed` and `bash/while` loops to the mix (ie, chances are pretty good you can do the whole operation with a single `awk` script); have you tried a web search on `awk merge join files`? lots of matches like [this](https://stackoverflow.com/q/5467690), [this](https://unix.stackexchange.com/q/410657) and [this](https://askubuntu.com/q/1017612) – markp-fuso May 30 '22 at 14:30
  • 1
    Please [edit] your question to replace the graphical representations of your data with your data. We cant copy/paste a graphical representation to test a potential solution with and it hides important details such as whether or not fields are quoted. Always simply post text - no images, no links, just text. – Ed Morton May 30 '22 at 14:49

4 Answers4

1

I think you should look at join command.

for example a simple line could do the job:

join --header --nocheck-order -t $'\t' table_B.tsv table_A.tsv
Jay jargot
  • 2,745
  • 1
  • 11
  • 14
  • 1
    `--nocheck-order` is set by default, you don't need to set it manually, and it just means you won't be warned if the input isn't sorted but the command will quietly produce incorrect output if the input is unsorted, which appears to be the case from the data shown in the question. – Ed Morton May 30 '22 at 14:52
1
awk -F"\t" -v OFS="\t" 'FNR==NR{a[$1]=$2} FNR!=NR{$1=$1"\t"a[$1];print}' table_B.tsv table_A.tsv
ufopilot
  • 3,269
  • 2
  • 10
  • 12
  • If you change `-vOFS` to `-v OFS` (i.e. with a space between `-v` and `OFS`) then it'll work in any awk, not just gawk. – Ed Morton May 30 '22 at 14:55
1
(sed -u 1q ; sort) < table_B.tsv |
join --header -t $'\t' - \
<( (sed -u 1q; sort) < table_A.tsv ) > new_table.tsv
  • join joins on the matching first fields of the two files.

  • join requires sorted data, the above uses a trick to sort whilst preserving headers. sed -u (unbuffered) is GNU specific.

dan
  • 4,846
  • 6
  • 15
1

CODE

{m,g}awk 'NF*=FNR==NR ? _*(__[$!_]=NR<NF ?" \t ":$-_) : ($!_=__[$!_])^_' OFS='\t' FS='\t+' 

OUTPUT

        N20 N23 N30 N31 N32
8fa80ed445a221fb64b504e3e605e6c8    k__Bacteria;p__Nitrospirota;c__Nitrospiria;o__Nitrospirales;f__Nitrospiraceae;g__Nitrospira;s__Nitrospira_defluvii  48  52  63  58  58  73
9fdfa8e132a820fdbc126575f0d1daa9    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Chitinophagales;f__Chitinophagaceae;g__Ferruginibacter;s__midas_s_11  29  24  28  29  29  24
dece101df1f2e1a45401ed3c948c635e    k__Bacteria;p__Nitrospirota;c__Nitrospiria;o__Nitrospirales;f__Nitrospiraceae;g__Nitrospira;s__Nitrospira_defluvii  23  34  21  24  22  24
80bd783b1e56858db62a78ff5c264e9b    k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Nitrosomonadaceae;g__Nitrosomonas    26  27  17  31  24  28
a138bd5162bef8329e50dc55a22b2da2    k__Bacteria;p__Chloroflexi;c__Anaerolineae;o__Anaerolineales;f__Anaerolineaceae;g__UTCFX1;s__midas_s_12690  24  24  15  13  19  17
7db3477cd30c755822b947dfbaea0248    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Cytophagales;f__Microscillaceae;g__OLB12;s__midas_s_21673 13  15  14  14  23  15
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11