0

So I'm using a MacOS commandline and have two files

File A.txt

A
B
F

File B.txt

>A
abcde
>B
efghi
>C
jklmn
>D
opqrs
>E
tuvwx
>F
yz123

I want it to go through a while loop through file A.txt and only print the corresponding header and content from file B.txt

>A
abcde
>B
efghi
>F
yz123

This line works when I go through each line in File A individually. grep -n "\A\,\>\{x;p;}" B.txt

But when I do this: While read i; do grep -n "\$i\,\>\{x;p;}" B.txt >> newfile.txt; done < A.txt

I get this error: grep: invalid repetition count(s)

What am I doing wrong?

tripleee
  • 175,061
  • 34
  • 275
  • 318
cms72
  • 177
  • 10
  • @tripleee Let's start a project for convenient handling of fasta files. This format seems to be widely used with no tools other than grep, sed, awk. That's actually scaring (me) given the purpose it is used for :) – hek2mgl Mar 04 '19 at 12:31
  • Is that really `grep`? `grep -n "\A\,\>\{x;p;}" B.txt` looks like it should be `sed -n "\A\,\>\{x;p;}" B.txt` – William Pursell Mar 04 '19 at 12:37
  • Hi @tripleee - it worked! – cms72 Mar 04 '19 at 12:38
  • and @hek2mgl -it would be very helpful. Looking for answers is as only good as my google search. Thank you both very much for your help. – cms72 Mar 04 '19 at 12:38
  • @hek2mgl The problem is not lack of tools, they have BioPerl, BioPython etc ... it's just that biotech people are often just learning basic U*x. – tripleee Mar 04 '19 at 12:38
  • @cms72 Please accept the duplicate nomination so that this question no longer comes up as unresolved. – tripleee Mar 04 '19 at 12:39
  • @William Pursell - for some reason, sed works on linux fine, but not macos. Thank you for the suggestion though! – cms72 Mar 04 '19 at 12:41
  • @tripleee accepted duplicate! And yes, I use bioperl and biopython -very handy. But I do the simple data manipulation on command line. I still google everything when I'm stuck on a code -its just knowing what to google. It would be nice to have a database for handling fasta files..so much easier to search for. Thanks again! – cms72 Mar 04 '19 at 12:46
  • The `sed` dialect on MacOS is slightly different. It can generally speaking do the same things as Linux `sed`, with a few rather exotic exceptions; but you have to know the differences if you want to translate from one dialect to another. – tripleee Mar 04 '19 at 12:48
  • Hmmm I see. Thanks @tripleee ! – cms72 Mar 04 '19 at 12:57

1 Answers1

1

With grep you could use:

/bin/grep -A1 -Ff fileA fileB 
>A
abcde
>B
efghi
--               <--- produces separators
>F
yz123

Alternatively with awk:

awk 'NR==FNR{a[$0];next}{sub(/^>/,"")} $0 in a {print ">"$0;p=1;next} p{print;p=0}' fileA fileB 
>A
abcde
>B
efghi
>F
yz123
hek2mgl
  • 152,036
  • 28
  • 249
  • 266