1

I would like to count the lines where column 7 has 3'UTR-like regular expression, but I do not know how to make this work with the symbol '. Could anyone give any idea? Thank you very much!

awk -F "\t" '$7 ~ /3'UTR/ {print}' a.txt | wc -l

Thanks,

Xiayu

  • 2
    It sounds like you are trying to [escape single quotes](http://stackoverflow.com/a/1250279/391161). – merlin2011 Aug 15 '14 at 21:53
  • 1
    Also, for small cases like this it can be easy enough to use double quotes around the awk command and escape the $ as in `awk "\$7 ~ /3'UTR/ {print}"` – jas Aug 15 '14 at 22:12

2 Answers2

1

You cannot include single quotes inside a single-quote-delimited script. There is no ideal solution, they all have caveats and drawbacks, but IMHO The best approach is just to use the ascii escape sequence for a quote (i.e. \047) as that doesn't require any tricky quoting and/or escaping and/or variables which can lead to string concatenation issues and it will work in any modern awk on any platform:

$7 ~ /3\047UTR/

Its only drawback is having to remember that that's what \047 means :-).

By the way, you don't need a pipe to wc, your script can just be:

awk -F '\t' '$7~/3\047UTR/{c++} END{print c+0}' a.txt
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Here is another way to do it:

awk '$7~test {a++} END {print a+0}' test="3'UTR" file

You do not need the to set Field separator to tab, since tab and space is default.
Since awk does not handle well single quote within the code, you can just define it outside the code. This way you do not need to remember the escape code to use it in the code. The +0 is used to print 0 if none is found, else it will just not print any thing.

You can also add the variable at the beginning.

awk -v test="3'UTR" '$7~test {a++} END {print a+0}' file
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • The problem with that is that because you're specifying the RE as a string it gets parsed twice, once when the script is read and then again when it's executed in the RE context, so you'd need to double-escape anything you'd want to have escaped. Also because it's inside double quotes anything you prefix with a `$` would get expanded by the shell so THAT'd need to be quoted and quoted again. It's just more complicated to deal with. Also, since the OP is using a tab as the FS it's possible his fields contain blank chars and if so you can't use the default FS just because it includes tabs. – Ed Morton Aug 17 '14 at 15:28
  • 1
    @EdMorton, I thought my solution was genial :), but was obvious wrong... Thanks for the in depth explanation. Learning new stuff everyday. – Jotne Aug 17 '14 at 20:08