1

I have a file such as :

@SRR9110374.1 1/1
GAGTATAAAGAAGAAAGTAAATCTCGGTTCGTCTCTTCATCGAGAGAAATGTCGACGAGAAAAAAAAAACAAGGGCTCATTTAAAGCCTTTCAAATCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.2 2/1
ATATGGAACAAGTTAAAAAAAATAAAAAGCAAAGAAATAATGTTTTGTCATCGAAAGTGTCGACATAAAAACAGGTTGGCATCTGGCCTGGTATCTCA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<BFFFFFFF<FFFFFFFFFFF
@SRR9110374.3 3/1
NTATAACCGTATCAAAGAAGTTTACCCCGAGAGAAGCACGCAGTTTCCCACAGGTAATTTTCTCACAAGCGAGAGAAACATCATACCGCAATCAGGAAC
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFF
@SRR9110374.4 4/1
GATAAAGAATATAGCTATGTATAGCCGGGATATATTAAGTGATTGAAATATCTCTTAGAAATCCATAGAATAGTAGTGTATCGAATAGGAGGAAGCGAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.5 5/1
CTTCCAATGCTTGCCAAAGTTCATTGTCGTTGTAATTATCGAAAGGATCTAAATTCTTTCTCAACGAACCCGAGAATAGGAAGGGTTCTTGAGGAATTAT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFBFFF/FFF
@SRR9110374.6 6/1
ACCGATAATCTTTCCTTCTCAAGAATTTTGTTAATATTCCACATTTTTAAATAGATTTCATTTCTCTCTCTCTTTCTCTCTCTTTTTCTTGTCCTCGATG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFBFFFF///FF
@SRR9110374.7 7/1
GTTGTGCTGAGAATGTTAATAAATTACAAAATGTTATCACTAACTTGGAAATATTCGAATCGACAGATATCGCGTTTGTCGTGTTGTATTAATATATTC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.8 8/1
GTCATAGAACGGGGGAGGGGAGGAAGAAGAAAGGAAGGGAAAAAAACGAGAGAGAGAGAGGGGATTACGCTCGCCGTTCGAATCGTTAGGCGTCCGTTT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFBBFBBFF
@SRR9110374.9 9/1
AATTATTATTTAATCGACGCGTCTATCGATAAATCATCCTCGAATGCTAAGCAAAACTGAACTTCCGCAAATATTGCACACGAAACGTTGAAACAAAG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

and I would like to cat the content untill the Xth occurence savec in the Nb_occurence variable into a new file.

I tried :

Nb_occurence=4
cat file | awk 'BEGIN{ found=0} /@/{found=found+1} {if ( found < $Nb_occurence ) print }'

I should get :

@SRR9110374.1 1/1
GAGTATAAAGAAGAAAGTAAATCTCGGTTCGTCTCTTCATCGAGAGAAATGTCGACGAGAAAAAAAAAACAAGGGCTCATTTAAAGCCTTTCAAATCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.2 2/1
ATATGGAACAAGTTAAAAAAAATAAAAAGCAAAGAAATAATGTTTTGTCATCGAAAGTGTCGACATAAAAACAGGTTGGCATCTGGCCTGGTATCTCA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<BFFFFFFF<FFFFFFFFFFF
@SRR9110374.3 3/1
NTATAACCGTATCAAAGAAGTTTACCCCGAGAGAAGCACGCAGTTTCCCACAGGTAATTTTCTCACAAGCGAGAGAAACATCATACCGCAATCAGGAAC
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFF
@SRR9110374.4 4/1
GATAAAGAATATAGCTATGTATAGCCGGGATATATTAAGTGATTGAAATATCTCTTAGAAATCCATAGAATAGTAGTGTATCGAATAGGAGGAAGCGAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Ps: The real file is very huge so i I should get a method adapted to that it would be nice.

Inian
  • 80,270
  • 14
  • 142
  • 161
bewolf
  • 165
  • 9

3 Answers3

3

Could you please try following.

Nb_occurence=4
awk -v nb_occur="$Nb_occurence" '
BEGIN{
  occur=0
}
/@/{
  occur++
}
occur>nb_occur{
  exit
}
occur
' Input_file


Ps: The real file is very huge so i I should get a method adapted to that it would be nice.

To make Input_file reading faster:

To FASTER your processing of Input_file I have used exit so once your mentioned number of occurrences are done with reading it will ASAP come out of Input_file, since we need NOT to read it further and thus it should be FASTER than your solution.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
2

You should rewrite your awk on this way:

awk -v occurence=$Nb_occurence 'BEGIN{ found=0} /@/{found=found+1} {if ( found < occurence ) print }' file

And you do not need cat, awk can read the file

Romeo Ninov
  • 6,538
  • 1
  • 22
  • 31
2

Yet another awk:

$ awk -v n=4 '/@/&&!n--{exit}1' file

Output:

@SRR9110374.1 1/1
GAGTATAAAGAAGAAAGTAAATCTCGGTTCGTCTCTTCATCGAGAGAAATGTCGACGAGAAAAAAAAAACAAGGGCTCATTTAAAGCCTTTCAAATCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.2 2/1
ATATGGAACAAGTTAAAAAAAATAAAAAGCAAAGAAATAATGTTTTGTCATCGAAAGTGTCGACATAAAAACAGGTTGGCATCTGGCCTGGTATCTCA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<BFFFFFFF<FFFFFFFFFFF
@SRR9110374.3 3/1
NTATAACCGTATCAAAGAAGTTTACCCCGAGAGAAGCACGCAGTTTCCCACAGGTAATTTTCTCACAAGCGAGAGAAACATCATACCGCAATCAGGAAC
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFF
@SRR9110374.4 4/1
GATAAAGAATATAGCTATGTATAGCCGGGATATATTAAGTGATTGAAATATCTCTTAGAAATCCATAGAATAGTAGTGTATCGAATAGGAGGAAGCGAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Explained:

$ awk -v n=4 '    # -v variable=value is the way to introduce values to awk from the shell
/@/ && !n-- {     # when @ met (n+1)th time
    exit          # ... exit
}1' file          # output
James Brown
  • 36,089
  • 7
  • 43
  • 59