sed remove string until next occurence

Question

imagine, that i've some chatlog protocol. It could look like this:

MSG sender|reciever2: Hello its meCRLF
MSG bob|anna: Hello annaCRLF
MSG bob|anna: How are youCRLF
MSG anna|bob: Im fine, you?CRLF
MSG bob|anna: Same, wanna hang out some time?CRLF
MSG anna|bob: YesCRLF
MSG bob|peter: hey im asking anna to hang out lolCRLF
MSG anna|bob: for sureCRLF
MSG anna|bob: maybe in a few weeks?CRLF

I only want to get the chat between Anna and Bob, but only want to have the senders name one time, just until the other chatpartner begins.

What i've already archived is this sed script.

s/^MSG\s+(anna|bob)\|(anna|bob)\:\s{1}(.+)CRLF$/\1: "\3"/g
t end

/^.*/d

:end

This creates:

bob: "Hello anna"
bob: "How are you"
anna: "Im fine, you?"
bob: "Same, wanna hang out some time?"
anna: "Yes"
anna: "for sure"
anna: "maybe in a few weeks?"

But i want something similar to:

bob: 
  Hello anna
  How are you
anna
  Im fine, you?
bob: 
  Same, wanna hang out some time?
anna: 
  Yes
  for sure
  maybe in a few weeks?

So, how can delete after one bob, all the bobs until the next anna comes? Note, this is some stuff i have to use sed for. This has to run on Ubuntu Linux Systems with sed (GNU sed) 4.7 Packaged by Debian

Yes. this is literal text. This is part of the imaginary protocoll defintion. Since this i a text file, there is if couse `\n` at the end of the file. I already remove that `CLRF` in my short sed script. — Ulf Tietze, Nov 10 '21 at 11:33
This is going to be painful with `sed`; are you sure you can't accept a solution in Awk, or even pure shell script? — tripleee, Nov 10 '21 at 12:48
Yes, this will be painful in sed, i now that. I'm a student at university and we have to create a problem and to solve that with sed. So this is my created problem. It's just important to use sed. — Ulf Tietze, Nov 10 '21 at 12:56
What I can propose is, capture the user name to the hold space, then append the hold space to the pattern space and check if the string after the newline is identical to the beginning of the string. Too lazy to troubleshoot this, but something like `sed '/\(anna|bob\|bob|anna\)/!d;s/^MSG [^|]*|//;G;s/^\([^:]*\): \(.*\)\n\1/\2/p;t;p;s/: .*//;h'` This gives me syntax errors on MacOS, but might work with a few tweaks on Linux. (Probably [edit] to specify your platform; nontrivial `sed` scripts are rarely portable.) — tripleee, Nov 10 '21 at 13:19
@KamilCuk it's just for training reasons. And I've created this problem myself. This is not getting graded, it's just for learning sed. And i'm always down to learn something new, so i always try to make a challenge out of it to learn. Which, indeed, through your solution i've definitly done. — Ulf Tietze, Nov 10 '21 at 14:35

KamilCuk · Accepted Answer · 2021-11-10T14:15:17.033

The following script:

cat <<EOF |
MSG sender|reciever2: Hello its meCRLF
MSG bob|anna: Hello annaCRLF
MSG bob|anna: How are youCRLF
MSG anna|bob: Im fine, you?CRLF
MSG bob|anna: Same, wanna hang out some time?CRLF
MSG anna|bob: YesCRLF
MSG bob|peter: hey im asking anna to hang out lolCRLF
MSG anna|bob: for sureCRLF
MSG anna|bob: maybe in a few weeks?CRLF
EOF
sed '
  # preprocess - remove uninterested parts
  /MSG \(\(anna\)|bob\|\(bob\)|anna\): \(.*\)CRLF/!d
  s//\2\3:\4/

  # Check if are doing it again with same name.
  G   # Grab the previous name from hold space.
  /^\([^:]*\):\(.*\)\n\1$/{   # The names match?
    s//  \2/p                 # Print only the message.
    d
  }

  h    # Put the whole line into hold space. For later.
  s/^\([^:]*\):\([^\n]*\).*/\1/   # Extract only name from the line.
  x    # Put the name in hold space, and grab the full line from hold space.
  s//\1:\n  \2/     # Print the name with the message.
'

outputs:

bob:
  Hello anna
  How are you
anna:
  Im fine, you?
bob:
  Same, wanna hang out some time?
anna:
  Yes
  for sure
  maybe in a few weeks?

Tsk tsk, that's a [useless `cat`](https://stackoverflow.com/questions/11710552/useless-use-of-cat) (-: — tripleee, Nov 10 '21 at 14:24
Buddy, your solution works like a charm. Thank you very much. I had problems to put content in the hold space and reading out of it. I'm an absolute beginner in sed, i didn't even know this could work this way. — Ulf Tietze, Nov 10 '21 at 14:30

potong · Answer 2 · 2021-11-10T23:51:29.017

This might work for you (GNU sed):

sed -E '/^MSG ((anna)\|bob|(bob)\|anna): (.*)CRLF/{s//\2\3:\4/;H};$!d
       x;s/(\n.*:).*(\1.*)*/\1\n&/mg;s/\n+.*:(\S)/\n  \1/mg;s/.//' file

Turn on extended regexp -E.

Gather up the anna and bob conversations in the hold space.

At the end of file swap to the hold space, prepend the name of the of the following lines of conversation, remove the unwanted names and space indent each line of conversation for the prepended name.

Finally remove the first newline artefact.

An alternative solution (similar to KamilCuk):

sed -E '/^MSG ((anna)\|bob|(bob)\|anna): (.*)CRLF/!d;s//\2\3:\4/;G
        /^([^:]*:)(.*)\n\1$/{s//  \2/p;d};h;s/:.*/:/p;x;s/[^:]*:/  /;P;d' file

score 1 · Answer 3 · 2021-11-12T14:16:17.147

This uses POSIX sed syntax.

sed '
/^MSG \(anna\)|bob:/!{
  /^MSG \(bob\)|anna:/!d
}
s//\1:\
 /;s/CRLF$//;t t
:t
H;x;s/^\([^:]*:\n\).*\1//;t
g' file

It appends the current record to the previous one in the hold space, swaps them, removes duplicate names (along with the previous record), or else reverts the pattern space back to the original current record.

Here's a more efficient version:

sed '
t
/^MSG \(anna\)|bob:/!{
  /^MSG \(bob\)|anna:/!d
}
s//\1:\
 /;s/CRLF$//
H;s/:.*/:/
x;s/^\([^:]*:\n\)\1//p;D' file

This avoids the use of .* in the duplicate detecting regexp by using the hold space to store the previous name rather than the entire previous record.

sed remove string until next occurence

3 Answers3