0

Is it possible to highlight a search phrase with awk the same way it is using grep?

Given the following file

>gene

AAATTTGCAGAGATTACAGGGGGGG

This grep command $ grep --color=auto GATTACA file.txt produces

AAATTTGCAGAGATTACAGGGGGGG

where the bold is the colored text.

Because the files I'm actually using have patterns spanning across multiple lines, I'm using awk instead of grep. So instead the files look like this:

> gene

AAATTTGCAGAGAT

TACAGGGGGGG

and I can use the following awk code to print the record with my phrase

awk 'BEGIN{RS=">"; FS="\n";}/GATTACA/{print$0}' file.txt

returning

gene

AAATTTGCAGAGATTACAGGGGGGG

but I would like my pattern to be a color (like the grep cmd):

gene

AAATTTGCAGAGATTACAGGGGGGG

Any help would be greatly appreciated as I'm still very new to unix and awk. This question is not a duplicate of How to print awk's results with different colors for different fields?. This question differs in that it is asking to color a search term and not an entire field. Since technically I'm printing the whole field {print$0}, my whole returned result changes color.

awk 'BEGIN{RS=">"; FS="\n";}/GATTACA/{print "\033[0;32m"$0"\033[0m"}' file.txt

returns

gene

AAATTTGCAGAGATTACAGGGGGGG

I also tried this:

awk 'BEGIN{RS=">"; FS="\n";}"\033[0;32m"/GATTACA/"\033[0m"{print$0}' file.txt

which just returns the error:

awk: (FILENAME=nametest.txt FNR=1) fatal: division by zero attempted

I'm just not sure how to incorporate the color code into the search term only. It may be that my awk code needs to be entirely reformatted. Please let me know! Thanks again!

codeforester
  • 39,467
  • 16
  • 112
  • 140
moxed
  • 343
  • 1
  • 6
  • 16
  • Tried solution from that thread, but was unable to highlight just the search term. Instead, entire field is colored. Probably has to do with the multiline search and returning the entire field with '{print$0}'. Just not sure how to work around it. Thanks! – moxed May 19 '16 at 03:05
  • To colorize only the search term, you could use a gsub with those color codes like this: `BEGIN { RS=">"; FS="\n"; gene="GATTACA"}` `$0 ~ gene { gsub( gene, "\033[0;32m&\033[0m", $0); print$0 }` Here `&` is, what was found (GATTACA). – Lars Fischer May 20 '16 at 20:23

1 Answers1

1

Lars Fischer answered this in a comment. This community wiki post formalizes (and improves) it.

To colorize only the search term, you could use a global substitution (gsub) with those color codes:

awk 'BEGIN { RS=">"; FS="\n" } gsub(/GATTACA/, "\033[0;32m&\033[0m", $0)' file

This sets the record separator (RS) to > rather than the default \n (line break) and the field separator (FS) to \n rather than the default of other white space characters. Then it performs a global substitution on that query, replacing the text with the text surrounded by the proper color codes.

gsub returns the number of substitutions it made and clauses without commands get printed by default, so this code runs the substitution as the clause and awk therefore prints if and only if there was a substitution made. (gsub does its work and returns zero (false) when there are no substitutions and non-zero (true) when there are substitutions.)

Adam Katz
  • 14,455
  • 5
  • 68
  • 83