using sed for extracting multiple matches

Question

I have the following line:

echo AS:i:0  UQ:i:0  ZZ:Z:mus.sup  NM:i:0  MD:Z:50  ZZ:Z:cas.sup  CO:Z:endOfLine|sed 's/.*\(ZZ:Z:.*[ ]\).*/\1/g'

which outputs:

ZZ:Z:cas.sup

I'd like to use sed for extracting both ZZ:Z entries from the given line, such as (please avoid awk since the position of ZZ:Z entries may differ per each line in my file):

preferable output:

ZZ:Z:mus.sup  ZZ:Z:cas.sup

Or possibly:

ZZ:Z:mus.sup  
ZZ:Z:cas.sup

Thanks.

score 1 · Answer 1 · answered Nov 12 '16 at 07:18

1

You can surely achieve it with sed, but wouldn't a tr and grep solution be more natural (because you seem to actually have different logical records despite the fact they appear on a single line):

echo AS:i:0  UQ:i:0  ZZ:Z:mus.sup  NM:i:0  MD:Z:50  ZZ:Z:cas.sup  CO:Z:endOfLine | tr ' ' '\n' | grep "ZZ:Z"

and if you want all back into a single line, just add | tr '\n' ' ' at the end for converting back \n into spaces.

Of course you could also replace grep with sed in this solution.

answered Nov 12 '16 at 07:18

Thomas Baruchel

7,236
2
27
46

would appreciate if someone could come up with sed solution so to avoid piping, and as for future reference... I was hoping to see a sed solution that may implement the use of word boundaries (\b). therefore allowing to fetch everything that is after ZZ:Z, but before the next right boundary. something such as: \bZZ:Z:.*\b but I couldn't figure the correct sed syntax... – Roy Nov 12 '16 at 07:24

score 1 · Accepted Answer · edited May 23 '17 at 12:13

1

Try grep with the -o (or --only-matching) flag:

$ grep -o 'ZZ:Z:[^ ]* ' <<< "AS:i:0  UQ:i:0  ZZ:Z:mus.sup  NM:i:0  MD:Z:50  ZZ:Z:cas.sup  CO:Z:endOfLine"
ZZ:Z:mus.sup 
ZZ:Z:cas.sup

Or with sed, based on this @potong answer:

sed 's/ZZ:Z:/\n&/g;s/[^\n]*\n\(ZZ:Z:[^ ]* \)[^\n]*/\1 /g;s/.$//'

If you have only two occurrences of the pattern per line:

sed -n 's/.*\(ZZ:Z[^ ]*\).*\(ZZ:Z[^ ]*\).*/\1 \2/p' <<< "AS:i:0  UQ:i:0  ZZ:Z:mus.sup  NM:i:0  MD:Z:50  ZZ:Z:cas.sup  CO:Z:endOfLine"

edited May 23 '17 at 12:13

Community

1
1

answered Nov 12 '16 at 08:12

SLePort

15,211
3
34
44

#Kenavoz: 1. the sed solution doesn't work for me (at least not on Mac's terminal..) 2. Is there a way to make the [ grep -o ] solution to output per each line the two outputs in the SAME line? such as: ZZ:Z:mus.sup ZZ:Z:cas.sup Please refrain the solution of piping to 'tr' - such as: grep -o 'ZZ:Z:[^ ]*'|tr '\n' ' ' because this solution outputs in one line ALL the ZZ:Z matches from a multiple lines file. The desired solution would be: pairs of ZZ:Z matches - such as: ZZ:Z:mus.sup ZZ:Z:cas.sup in each line. Thus collecting all ZZ:Z: from each line and presenting in one line – Roy Nov 12 '16 at 15:06

using sed for extracting multiple matches

2 Answers2