Sed/Awk to delete second occurence of string - platform independent

Question

I'm looking for a line in bash that would work on both linux as well as OS X to remove the second line containing the desired string:

Header
1
2
...
Header
10
11
...

Should become

Header
1
2
...
10
11
...

My first attempt was using the deletion option of sed:

sed -i '/^Header.*/d' file.txt

But well, that removes the first occurence as well.

How to delete the matching pattern from given occurrence suggests to use something like this:

sed -i '/^Header.*/{2,$d} file.txt

But on OS X that gives the error

sed: 1: "/^Header.*/{2,$d}": extra characters at the end of d command

Next, i tried substitution, where I know how to use 2,$, and subsequent empty line deletion:

sed -i '2,$s/^Header.*//' file.txt
sed -i '/^\s*$/d' file.txt

This works on Linux, but on OS X, as mentioned here sed command with -i option failing on Mac, but works on Linux , you'd have to use

sed -i '' '2,$s/^Header.*//' file.txt
sed -i '' '/^\s*$/d' file.txt

And this one in return doesn't work on Linux.

My question then, isn't there a simple way to make this work in any Bash? Doesn't have to be sed, but should be as shell independent as possible and i need to modify the file itself.

I suggest perl for this. – Gabe Kopley Aug 04 '15 at 16:58 — Gabe Kopley, Aug 04 '15 at 16:58

score 3 · Answer 1 · answered Aug 04 '15 at 17:00

3

Since this is file-dependent and not line-dependent, awk can be a better tool.

Just keep a counter on how many times this happened:

awk -v patt="Header" '$0 == patt && ++f==2 {next} 1' file

This skips the line that matches exactly the given pattern and does it for the second time. On the rest of lines, it prints normally.

answered Aug 04 '15 at 17:00

fedorqui

275,237
103
548
598

It doesn't work when useing patt="Header.*" and also it goes to stdout, while I need to modify the file itself. How would it have to be modified to achieve this? – Anonymous Aug 04 '15 at 17:09
Hardcode then `/Header.*/`. To do in place editing, `awk ... file > tmpfile && mv tmpfile file` – fedorqui Aug 04 '15 at 19:29
1

@Max Adding `.*` to a regexp comparison is useless as it matches zero or more occurrences of any character so it will match exactly the same lines as just `Header` alone. In this case it MAY be overkill but if you like it you could keep the variable and just change `==` to `~` to do a regexp instead of string comparison to find `Header` in a line as opposed to when it's the whole line. – Ed Morton Aug 04 '15 at 21:10

score 1 · Answer 2 · answered Aug 04 '15 at 20:26

I would recommend using awk for this:

awk '!/^Header/ || !f++' file

This prints all lines that don't start with "Header". Short-circuit evaluation means that if the left hand side of the || is true, the right hand side isn't evaluated. If the line does start with Header, the second part !f++ is only true once.

$ cat file
baseball
Header and some other stuff
aardvark
Header for the second time and some other stuff
orange
$ awk '!/^Header/ || !f++' file
baseball
Header and some other stuff
aardvark
orange

score 1 · Answer 3 · answered Aug 04 '15 at 21:49

This might work for you (GNU sed):

sed -i '1b;/^Header/d' file

Ignore the first line and then remove any occurrence of a line beginning with Header.

To remove subsequent occurrences of the first line regardless of the string, use:

sed -ri '1h;1b;G;/^(.*)\n\1$/!P;d' file

Sed/Awk to delete second occurence of string - platform independent

3 Answers3

Linked

Related