sed: How to delete second match in a file

Question

I have a file that's looking like this (pseudocode):

---
foo: bar
bar: baz
---
baz: quz
---
Some text
Some text
Some text

I need to delete the second --- row, and only that. I know that sed can do this, but I have never been able to make heads nor tails out of any sed documentation I could find...

Wintermute · Accepted Answer · 2015-02-28T20:39:10.810

7

With sed the easiest way would be to first read the whole file into the pattern space and work on that:

sed ':a $!{N; ba}; s/\(^\|\n\)---\n/\n/2' filename

This does

:a                       # jump label for looping
$!{                      # if the end of input is not reached
  N                      # fetch the next line, append it to the pattern space
  ba                     # go back to :a
}                        # after this, the whole file is in the pattern space.
s/\(^\|\n\)---\n/\n/2    # then: remove the second occurrence of a line that
                         # consists only of ---

@mklement0 points out that the \| only works with GNU sed. A way to work around that, since the \| is only necessary to catch --- in the first line, would be

sed ':a $!{ N; ba; }; s/^/\n/; s/\n---\n/\n/2; s/^\n//' filename

This does:

:a $!{ N; ba; }  # read file into the pattern space
s/^/\n/          # insert a newline before the first line
s/\n---\n/\n/2   # replace the second occurrence of \n---\n with \n
s/\n//           # remove the newline we put in at the beginning.

This way, the first line is no longer a special case.

Without reading the whole file into a buffer, you'll have to construct a counter from characters:

sed '/^---$/ { x; s/.*/&_/; /^__$/ { x; d; }; x; }' filename

That is:

/^---$/ {    # if a line is ---
  x          # exchange pattern space and hold buffer
  s/.*/&_/   # append a _ to what was the hold buffer
  /^__$/ {   # if there are exactly two in them
    x        # swap back
    d        # delete the line
  }
  x          # otherwise just swap back.
}

...or just use awk:

awk '!/^---$/ || ++ctr != 2' filename

edited Feb 28 '15 at 20:39

answered Feb 28 '15 at 18:55

Wintermute

42,983
5
77
80

Kudos for the elegant `awk` solution (stopped my head from spinning after trying to understand the 2nd `sed` solution). Note that your `sed` solutions assume _GNU_ sed (BSD `sed` will choke on them; the first one uses `\|` alternation, which POSIX basic regexes don't support at all (unfortunately); the 2nd requires terminating `;` chars. before the closing `}` chars. to make BSD `sed` happy). – mklement0 Feb 28 '15 at 20:15
@mklement0 Writing portable sed is hard. I hope the workaround I edited in for the first version works with BSD sed; I don't have one lying around to test it right now. It works with `sed --posix`, anyway. I also edited in the semicolons in the second, although I'm not sure that's enough. I remember BSD sed being very, very picky about braces. – Wintermute Feb 28 '15 at 20:43
Sadly, BSD sed is finicky about a great many things: lack of support for control-character sequences, needing line breaks after label names and branching commands, … - see http://stackoverflow.com/a/24276470/45375 (pardon the plug). Your 2nd command works now, but this is what your 1st command must look like to work with BSD sed; note the selective use of ANSI C-quoted strings; to make it clearer what happens, I've broken it into multiple `-e` expressions (unfortunately, I'm not kidding): `sed -e $':a\n$!{N; ba\n}' -e $'s/^/\\\n/' -e 's/\n---\n/\'$'\n''/2' -e 's/^\n//' filename`. – mklement0 Feb 28 '15 at 21:26
Best to just put it in a file with `#!/bin/sed -f` at the top, I think. – Wintermute Feb 28 '15 at 21:30
1

Or use your simple, elegant `awk` solution :) – mklement0 Feb 28 '15 at 21:31
If you use GNU sed there is a flag `M` on regex's that uses `^` and `$` on multilines i.e. `s/^---\n//M2` removes the second occurrence of `---` at the start of a line followed by a newline. However slurping a whole file into memory may not always be practical. – potong Jul 29 '18 at 12:42

score 2 · Answer 2 · answered Feb 28 '15 at 20:39

2

sed is for simple substitutions on a single line. For anything else you should just use awk:

$ awk '!(/^---$/ && ++cnt==2)' file
---
foo: bar
bar: baz
baz: quz
---
Some text
Some text
Some text

answered Feb 28 '15 at 20:39

Ed Morton

188,023
17
78
185

Fully agreed. I was tempted to create a similar answer, until I noticed that @Wintermute had included `awk '!/^---$/ || ++ctr != 2' filename` in his answer... :) – mklement0 Feb 28 '15 at 20:57
oh, so he did. I missed that, thought it was all about sed. Oh well, I think my logic is clearer so I'll leave it for now and if @WIntermute updates to use the above logic I'll delete my answer. – Ed Morton Feb 28 '15 at 21:02
It's kind of a toss-up, I think -- both ways are fine. I have no qualms about adding your way to my answer, if you prefer it that way. Do you prefer it that way? – Wintermute Feb 28 '15 at 21:32
1

Yeah, maybe it's just me but I find `!(a && b)` much easier to understand than `!a || !b`. No need to mention me in your answer, just add the code as an alterantive. thanks. – Ed Morton Feb 28 '15 at 21:44

score 2 · Answer 3 · answered Feb 28 '15 at 20:41

Here's some spaghetti sed code (uses goto)

sed '/^---/ {:a;n;/^---/{d;bb};ba;:b}' file

with commentary

sed '/^---/ {      # at the first match
    :a             # label "a"
    n              # get the next line of input
    /^---/{d;bb}   # if it matches, delete the line and goto "b"
    ba             # branch to "a" (goto)
    :b             # label "b"
}' file

But I'll add my opinion that using sed for anything complicated leads to unmaintainable code. Use awk or perl. Thanks for the opportunity to show off though ;)

potong · Answer 4 · 2018-07-29T14:08:07.610

1

This might work for you (GNU sed):

sed '/^---/{x;s/^/n/;/^n\{2\}$/{x;d};x}' file

Make a counter in the hold space. Every time you encounter a line beginning --- add one to the counter and if the counter is 2 delete the current line.

edited Jul 29 '18 at 14:08

answered Jul 28 '18 at 14:26

potong

55,640
6
51
83

score -1 · Answer 5 · edited May 23 '17 at 12:22

-1

See Sed replace every nth occurrence

The solution uses awk rather than sed, but "use the right tool for the job". It may or may not be possible to do in sed but, even if it is, it will be a lot easier in a tool like awk or perl.

edited May 23 '17 at 12:22

Community

1
1

answered Feb 28 '15 at 18:37

catlan

25,100
8
67
78

sed: How to delete second match in a file

5 Answers5

Linked