0

I have a given file text below:

aatgcacatgttgcatatcaagtggatatgggtggtggaaaactgtataatggccaagcc
aatttccgtttattatttgacccaactcaagcagtagctattccgagtagcgaatttcca

I am trying to find a grep and word count wc command that allows me to find and then count all the "a" and "g" in the file text.

I have previously tried using

egrep 'a|g' outputSequence.txt|wc -c

I am using 'a|g' from: https://unix.stackexchange.com/questions/37313/how-do-i-grep-for-multiple-patterns-with-pattern-having-a-pipe-character

I have tried using:

grep -o 'a|g' outputSequence.txt|wc -l

The code:

grep -o 'a|g' outputSequence.txt|wc -l 

outputs 0.

I cannot find a resource that allows me to grep and word count both a and g in each line.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
Gil Ong
  • 67
  • 1
  • 6
  • Are you trying to solve the problem using any method or do you need to use grep and wc? Your task might be easier to solve using awk such as discussed in this [Count Occurrences of char in string](https://stackoverflow.com/questions/16679369/count-occurrences-of-a-char-in-a-string-using-bash) SO thread. – Dan Breidegam Sep 01 '19 at 23:02

1 Answers1

3

Your approach is ok and needs to be improved:

  • grep -o 'a|g' searches for a followed by | followed by g. You need to escape | to make it an OR. The grep -o 'a\|g' will search for a letter OR letter g in the output. It will output a and g characters separated by newlines.
  • Then use sort | uniq -c to sort the letters and print the count.
  • Don't use egrep, it's deprecated. Use grep -E instead.

The command:

grep -o 'a\|g' outputSequence.txt | sort | uniq -c

should output:

 36 a
 26 g

But maybe you want the sum of count of as and gs, then you were close enough:

grep -o 'a\|g' outputSequence.txt | wc -l
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • 1
    Character classes would do the job nicely: `grep -o '[actg]' | sort | uniq -c` to get count of each DNA bases occurrences, or `grep -o '[ag]' | sort | uniq -c` to count only `a` and `g`. – Léa Gris Sep 01 '19 at 23:31
  • There's also `grep -o -e a -e g | wc -l`. – root Sep 02 '19 at 07:00