2

I try to change in a file some word by others using sed or awk.

My initial fileA as this format:

>Genus_species|NP_001006347.1|transcript-60_2900.p1:1-843

I have a second fileB with the correspondences like this:

NP_001006347.1 GeneA
XP_003643123.1 GeneB

I am trying to substitute in FileA the name to get this ouput:

>Genus_species|GeneA|transcript-60_2900.p1:1-843

I was thinking to use awk or sed, to do something like 's/$patternA/$patternB/' with a while read l but how to indicate which pattern 1 and 2 are in the fileB? I tried also this but not working.

sed "$(sed 's/^\([^ ]*\) \(.*\)$/s#\1#\2#g/' fileB)" fileA 

Awk may be able to do the job more easily?

Thanks

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Nico64
  • 161
  • 9

4 Answers4

4

It is easier to this in awk:

awk -v OFS='|' 'NR == FNR {
   map[$1] = $2
   next
}
{
   for (i=1; i<=NF; ++i)
      $i in map && $i = map[$i]
} 1' file2 FS='|' file1

>Genus_species|GeneA|transcript-60_2900.p1:1-843
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

Written and tested with your shown samples, considering that you have only one entry for NP_digits.digits in your Input_fileA then you could try following too.

awk '
FNR==NR{
  arr[$1]=$2
  next
}
match($0,/NP_[0-9]+\.[0-9]+/) && ((val=substr($0,RSTART,RLENGTH)) in arr){
  $0=substr($0,1,RSTART-1) arr[val] substr($0,RSTART+RLENGTH)
}
1
'  Input_fileB  Input_fileA
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

Using awk

awk -F [\|" "] 'NR==FNR { arr[$1]=$2;next } NR!=FNR { OFS="|";$2=arr[$2] }1' fileB fileA

Set the field delimiter to space or |. Process fileB first (NR==FNR) Create an array called arr with the first space delimited field as the index and the second the value. Then for the second file (NR != FNR), check for an entry for the second field in the arr array and if there is an entry, change the second field for the value in the array and print the lines with short hand 1

Raman Sailopal
  • 12,320
  • 2
  • 11
  • 18
1

You are looking for the join command which can be used like this:

join -11 -22 -t'|' <(tr ' ' '|' < fileB | sort -t'|' -k1) <(sort -t'|' -k2 fileA)

This performs a join on column 1 of fileB with column 2 of fileA. The tr was used such that fileB also uses | as delimiter because join requires it to be equal on both files.

Note that the output columns are not in the order you specified. You can swap by piping the output into awk.

shilch
  • 1,435
  • 10
  • 17