unix sed substitute nth occurence misfunction?

Question

Let's say I have a string which contains multiple occurences of the letter Z. For example: aaZbbZccZ. I want to print parts of that string, each time until the next occurence of Z:

aaZ
aaZbbZ
aaZbbZccZ

So I tried using unix sed for this, with the command sed s/Z.*/Z/i where i is an index that I have running from 1 to the number of Z's in the string. As far as my sed understanding goes: this should delete everything that comes after the i'th Z, But in practice this only works when I have i=1 as in sed s/Z.*/Z/, but not as I increment i, as in sed s/Z.*/Z/2 for example, where it just prints the entire original string. It feels as if there's something I am missing about the functioning of sed, since according to multiple manuals, it should work.

edit: for example, in the string aaZbbZccZ while applying sed s/Z.*/Z/2 I am expecting to have aaZbbZ, as everything after the 2nd occurence of Z get's deleted.

The first occurence matches up to the end of the string, there's no second occurence left to match. — Aaron, Apr 12 '18 at 16:31
Is there a particular reason you want to use `sed` for this job? Particularly if you're operating on a string variable as opposed to a file, this is arguably a job for shell-builtin string primitives. — Charles Duffy, Apr 12 '18 at 16:39
With awk: `echo 'aaZbbZccZ' | awk -F 'Z' 'END{OFS=FS; c=NF; line=$0; for(i=1; i — Cyrus, Apr 12 '18 at 18:19

KarolK · Answer 1 · 2018-04-12T17:02:06.650

Below sed works closely to what you are looking for, except it removes also the last Z.

$echo aaZbbZccZdd | sed -e 's/Z[^Z]*//1g;s/$/Z/'
aaZ

$echo aaZbbZccZdd | sed -e 's/Z[^Z]*//2g;s/$/Z/'
aaZbbZ

$echo aaZbbZccZdd | sed -e 's/Z[^Z]*//3g;s/$/Z/'
aaZbbZccZ

$echo aaZbbZccZdd | sed -e 's/Z[^Z]*//4g;s/$/Z/'
aaZbbZccZddZ

Edit: Modified according to Aaron suggestion.

Edit2: If you don't know how many Z there are in the string it's safer to use below command. Otherwise additional Z is added at the end.
-r - enables regular expressions
-e - separates sed operations, the same as ; but easier to read in my opinion.

$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//1g' -e 's/([^Z])$/\1Z/'
aaZ

$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//2g' -e 's/([^Z])$/\1Z/'
aaZbbZ

$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//3g' -e 's/([^Z])$/\1Z/'
aaZbbZccZ

$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//4g' -e 's/([^Z])$/\1Z/'
aaZbbZccZddZ

$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//5g' -e 's/([^Z])$/\1Z/'
aaZbbZccZddZ

Naoric · Answer 2 · 2018-04-12T18:18:12.857

This should do what you expect (see comments) unless your string can contain line breaks:

# -n will prevent default printing
echo 'aaZbbZccZ' | sed -n '{
    # Add a line break after each 'Z'
    s/Z/Z\
/g
    # Print it and consume it in the next sed command
    p
}' | sed -n '{
    # Add only the first line to the hold buffer (you can remove it if you don't mind to see first blank line)
    1 {
        h
    }
    # As for the rest of the lines
    2,$ {
        # Replace the hold buffer with the pattern space
        x
        # Remove line breaks
        s/\n//
        # Print the result
        p
        # Get the hold buffer again (matched line)
        x
        # And append it with new line to the hold buffer
        H
    }'

The idea is to break the string into multiples lines (each is terminated with Z), that will be processed one by one on the second sed command.

On the second sed we use the Hold Buffer to remember previous lines, print the aggregated result, append new lines and each time remove the line breaks we previously added.

And the output is

aaZ
aaZbbZ
aaZbbZccZ

potong · Answer 3 · 2018-04-15T23:48:12.827

0

This might work for you (GNU sed):

sed -n 's/Z/&\n/g;:a;/\n/P;s/\n\(.*Z\)/\1/;ta' file

Use sed's grep-like option -n to explicitly print content. Append a newline after each Z. If there were no substitutions then there is nothing to be done. Print upto the first newline, remove the first newline if the following characters contain a Z and repeat.

edited Apr 15 '18 at 23:48

answered Apr 13 '18 at 07:34

potong

55,640
6
51
83

unix sed substitute nth occurence misfunction?

3 Answers3

Linked

Related