0

I am beginner, hoping somebody can help me out. I would like to use awk (or sed) to do this task.

I have two files,

File 1

@HWI-M01162:73:000000000-A7TPE:1:1101:17896:1432
@HWI-M01162:73:000000000-A7TPE:1:1101:14465:1433

File 2

@HWI-M01162:73:000000000-A7TPE:1:1101:17896:1432 1:N:0:CTTGTA
CCCCAATGTGATCTGTTTACATTCCAACTCAGCTTCCTCTTGTAAATGTTTTTCTTTTTAC
+
8ABC-6F,C<,CFF9FGGAE9FGFF9<EFF9,CEEEEGGGG9E,,,C<FFFGGGGGGGGGG
@HWI-M01162:73:000000000-A7TPE:1:1101:14465:1433 1:N:0:CTTGTA
CCCATGATGGTACGAAAGTACACATTTTATTTCTTATAAGCAATGGGTTTACTCAGCCTGA
+
@CC<CA9F<<FCFE>87@FCFAF?FFGG9FFFFGGCF9,<EA8FC8EFFGEFGG98@FFC8
@HWI-M01162:73:000000000-A7TPE:1:1101:15447:1444 1:N:0:CTTGTA
CTCTATCTAGAGTTGCCTTCATCAGTTTATCAAAAACACAACCTTAAAAAGGCAACCCCTG
+
AACC@FFGA<@9FF9EEFGF9FF9<FFFEFF99,,<B8C<8@FFGD,,,,,:C8<FEFEF8
@HWI-M01162:73:000000000-A7TPE:1:1101:15876:1444 1:N:0:CTTGTA
CTCTTTATTTAGTTTTAACTTATCTCAAAAATTACTCGACCTAAAAAATTTGGCCTGTTTA
+
-8CCCG-EFFA9FFFG98FFF9FEEE9,,,,CFAEFF7,@CF,,,,,+CEF9,CFF8EFF,

The out put what I want is as follows. So if the line in file 1 match with line in the file 2, print the line in file 2 together with three consecutive lines. Lines in file 1 are all unique IDs.

@HWI-M01162:73:000000000-A7TPE:1:1101:17896:1432 1:N:0:CTTGTA
CCCCAATGTGATCTGTTTACATTCCAACTCAGCTTCCTCTTGTAAATGTTTTTCTTTTTAC
+
8ABC-6F,C<,CFF9FGGAE9FGFF9<EFF9,CEEEEGGGG9E,,,C<FFFGGGGGGGGGG
@HWI-M01162:73:000000000-A7TPE:1:1101:14465:1433 1:N:0:CTTGTA
CCCATGATGGTACGAAAGTACACATTTTATTTCTTATAAGCAATGGGTTTACTCAGCCTGA
+
@CC<CA9F<<FCFE>87@FCFAF?FFGG9FFFFGGCF9,<EA8FC8EFFGEFGG98@FFC8

2 Answers2

2
$ awk 'NR==FNR{a[$0];next} $1 in a{c=4} c&&c--' file1 file2
@HWI-M01162:73:000000000-A7TPE:1:1101:17896:1432 1:N:0:CTTGTA
CCCCAATGTGATCTGTTTACATTCCAACTCAGCTTCCTCTTGTAAATGTTTTTCTTTTTAC
+
8ABC-6F,C<,CFF9FGGAE9FGFF9<EFF9,CEEEEGGGG9E,,,C<FFFGGGGGGGGGG
@HWI-M01162:73:000000000-A7TPE:1:1101:14465:1433 1:N:0:CTTGTA
CCCATGATGGTACGAAAGTACACATTTTATTTCTTATAAGCAATGGGTTTACTCAGCCTGA
+
@CC<CA9F<<FCFE>87@FCFAF?FFGG9FFFFGGCF9,<EA8FC8EFFGEFGG98@FFC8
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

This awk will do that:

awk 'FNR==NR{k[FNR]=$0; next} 
     {data[FNR]=$0; next}
     END{
       for (i=1;i in k;i++) {
           for (j=1; j in data; j++){
                if (data[j]~k[i]){
                   for (x=0; x<4; x++)
                       print data[j+x]
                   j+=x
                }
           }
        }
      }' f1 f2
dawg
  • 98,345
  • 23
  • 131
  • 206