0

I have a text file1 that has some id's like:

  c10013_g2_i1|m.63|vomeronasal type-1 receptor 4-like  
  c10015_g1_i1|m.409|vomeronasal type-1 receptor 1-like

I used grep '^[^|]*' file1 to extract the string before | from file1.

I want each of this greped string to match lines from another file2 and return the whole line when matched. file2 looks like this:

  c10013_g2_i1  781 622.2   73  5.95    5.16  
  c10014_g1_i1  213 58.67   3   2.59    2.25  
  c10014_g2_i1  341 182.35  4   1.11    0.96  
  c10015_g1_i1  404 245.23  16  3.31    2.87  
  c10017_g1_i1  263 105.37  6   2.89    2.5 

Finally the result should look like:

c10013_g2_i1|m.63|vomeronasal type-1 receptor 4-like 781    622.2   73  5.95    5.16  
c10015_g1_i1|m.409|vomeronasal type-1 receptor 1-like 404   245.23  16  3.31    2.87
Mathews Jose
  • 399
  • 6
  • 18
kanika
  • 895
  • 1
  • 6
  • 11

3 Answers3

2

You can use awk:

awk 'FNR == NR {
   split($0, a, /[|]/)
   seen[a[1]] = $0
   next
}
$1 in seen {
   $1 = seen[$1]
   print
}' file1 file2

c10013_g2_i1|m.63|vomeronasal type-1 receptor 4-like   781 622.2 73 5.95 5.16
c10015_g1_i1|m.409|vomeronasal type-1 receptor 1-like 404 245.23 16 3.31 2.87
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

for structured text, awk is the king of tools.

$ awk 'NR==FNR{split($0,v,"|");a[v[1]]=$0; next} 
       $1 in a{k=$1; $1=""; print a[k] $0}' file1 file2  

c10013_g2_i1|m.63|vomeronasal type-1 receptor 4-like   781 622.2 73 5.95 5.16
c10015_g1_i1|m.409|vomeronasal type-1 receptor 1-like 404 245.23 16 3.31 2.87
karakfa
  • 66,216
  • 7
  • 41
  • 56
-2

Sounds like you're trying to join on the first field of each file. There's actually a join command that can do this. You'll need to change file1 slightly (join works on spaces):

cat file1 | sed 's/^\([^|]*\)[|]/\1 |/' | sort > file1-delimited

Then you can join them:

cat file2 | sort | join file1-delimited -

c10013_g2_i1 |m.63|vomeronasal type-1 receptor 4-like  781 622.2 73 5.95 5.16
c10015_g1_i1 |m.409|vomeronasal type-1 receptor 1-like 404 245.23 16 3.31 2.87

This should get you 95% of the way there, but the format might not be perfect.

David Ehrmann
  • 7,366
  • 2
  • 31
  • 40