1

I've got a bash script that handles a bunch of input and then prints out prettily-formatted output. At the moment it's very modular -- it spawns tons of subshells and uses echo, grep, sort, wc, & sed a lot, but I'm working on replacing the functionality of multiple chunks with larger awk chunks, for better efficiency.

One struggle: I've been trying to figure out how to search input for specific strings, only printing the exact thing I'm searching for. I've been playing with awk's match function but haven't had any success yet. Here's an example of one thing I'm trying to figure out how to integrate into a larger awk script:

$ egrep -o "pae|lm|vmx|svm|ht" /proc/cpuinfo | sort -u
ht
lm
pae
vmx

If I were to use awk to do the same thing, I'd want to end up with an array or variable containing each string I searched for that it found. The main problem as I see it is that each string I'm searching for might exist more than once in the input. Maybe I just need to buy an awk book... Any feedback welcome.

rsaw
  • 3,315
  • 2
  • 28
  • 30

2 Answers2

1

In awk, perhaps this is what your looking for, or may at least contain some helpful code:

awk '{ for (i = 1; i <= NR; i++) if ($i ~ /^(pae|lm|vmx|svm|ht)$/) array[$i]++ } END { for (j in array) print j }' /proc/cpuinfo

Output on my system:

vmx
pae
lm
ht

HTH

Steve
  • 51,466
  • 13
  • 89
  • 103
1

I think this will do your job:

awk -v RS="pae|lm|vmx|svm|ht" 'RT != "" {print RT}' /proc/cpuinfo

Or if you also need to do sorting in awk:

gawk -v RS="pae|lm|vmx|svm|ht" 'RT != "" {m[RT]} END{n=asorti(m, m_sorted); for(i=1;i<=n;++i){print m_sorted[i]}}' /proc/cpuinfo

Explanation: we set record separator RS to necessary regex, and awk stores the exact match of the RS regex in the RT variable. RT is empty for the last record, so we need to check for non-emptiness.

The sorting version uses function asorti which is gawk extension.

For more general approach, look into match function. For example, if you can set such record separator so match occures only one time per record, then the solution isn't complicated: gawk -v RS="your_separator" 'match($0, /pae|lm|vmx|svm|ht, m)" {print m[0]}

Alexander Putilin
  • 2,262
  • 2
  • 19
  • 32
  • Thanks for this. I'll see if I can get what I need out of it when I get back to my coding machine. – rsaw Jul 08 '12 at 17:13