1

I have a file like the following format:

line one  
line two <% word1  %> text <% word2 %>  
line three <%word3%>  

I want to use linux shell tools like awk, sed etc to get all the words quoted in <% %>
result should be like

word1  
word2  
word3  

Thanks for help.

I forgot to mention: I am in embedded environment. grep has no -P option

alzhao
  • 59
  • 6

5 Answers5

4

With GNU awk so we can RS to multiple characters:

$ gawk -v RS='<% *| *%>' '!(NR%2)' file
word1
word2
word3

With any modern awk:

$ awk -F'<% *| *%>' '{for (i=2;i<=NF;i+=2) print $i}' file
word1
word2
word3
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

You could do it with grep:

$ grep -oP '(?<=<%).+?(?=%>)' file
 word1  
 word2 
word3
user000001
  • 32,226
  • 12
  • 81
  • 108
2

This works for your sample:

sed -ne 's/%>/&\n/p' | sed -ne 's/.*<%\s*\(.*\)\s*%>.*/\1/p' < sample.txt

The first sed just puts a line break after every closing %>, as preparation.

The next sed extracts the relevant part within <% ... %> without leading and trailing whitespaces.

In both commands, the -n flag combined with s///p are to limit the data going through the pipe to the matching (relevant) lines only.

janos
  • 120,954
  • 29
  • 226
  • 236
  • Just be aware there's 2 non-portable sed constructs in the above: a) use of `\n` as a newline (backslash followed by a literal carriage return is portable) and b) use of `\s` to represent a space character (`[[:blank:]]` is POSIX, but in this case a literal blank char is probably adequate). I'm surprised your sed works with those when your grep doesn't support `-P`. – Ed Morton Aug 24 '13 at 13:04
2

Using awk:

awk -F '<% *| *%>' '{for(i=2; i<=NF; i+=2) print $i}' file
word1
word2
word3
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

This might work for you (GNU sed):

sed '/<%\s*/!d;s//\n/;s/[^\n]*\n//;s/\s*%>/\n/;P;D' file
potong
  • 55,640
  • 6
  • 51
  • 83