0

This sed code deletes all lines containing {$line} found withing file.txt:

sed -i "/{$pattern}/d" ./file.txt

Currently, this deletes all of the lines in this sample file.txt when $pattern is set to "cat":

One day, the {cat} said to the {owl}, "What are you eating for breakfast, {owl}?"
The {owl} replied, "I am eating {cereal}. Would you like some, {cat}?"
"Yes, I would," replied the {cat}.
So the {cat} and the {owl} ate {cereal}.

I need to change this, such that it does not delete the first found line containing {$pattern}, but only deletes all subsequent appearances found in later lines:

E.g., suppose file.txt is this:

One day, the {cat} said to the {owl}, "What are you eating for breakfast, {owl}?"
The {owl} replied, "I am eating {cereal}. Would you like some, {cat}?"
"Yes, I would," replied the {cat}.
So the {cat} and the {owl} ate {cereal}.

If $pattern were set to "cat", the output would look like this, because line 1 has the first occurrence of "cat", and all other later lines also have it, so they are all deleted:

One day, the {cat} said to the {owl}, "What are you eating for breakfast, {owl}?"

If $pattern were set to "owl", the output would look like this, because line 1 contains the first occurrence of "owl", and lines 2 and 4 also have instances, which are deleted:

One day, the {cat} said to the {owl}, "What are you eating for breakfast, {owl}?"
"Yes, I would," replied the {cat}.

If $pattern were set to "cereal", the output would look like this with line 4 deleted, because line 4 is the only one with an additional occurrence of "cereal".

One day, the {cat} said to the {owl}, "What are you eating for breakfast, {owl}?"
The {owl} replied, "I am eating {cereal}. Would you like some, {cat}?"
"Yes, I would," replied the {cat}.
  • The file must not be sorted, as the order of lines is important.
  • Note that line 1 contains two copies of "owl", but the second appearance on that line is not regarded as the second appearance of "owl".

Is there any way to set sed to not delete the line containing the first appearance of {$patter}, but to only edit all later occurrences of a pattern?

Village
  • 22,513
  • 46
  • 122
  • 163
  • Related? http://stackoverflow.com/questions/17910718/how-to-delete-the-matching-pattern-from-given-occurrence – fedorqui Mar 07 '14 at 13:23
  • Yes, that nearly answers, but I find `sed -i "/$line/{2,$d}"` results in `sed: -e expression #1, char 33: unexpected ','`. `{2,$d}` seems incompatible with double-quotes on `sed`, but `$line` does not seem to work when single-quotes are used. – Village Mar 07 '14 at 13:31
  • 1
    you want to delete each line of text that content $line in it or every occurence of $line in the file (in both case, since 2nd occurence). Your question state the content but the sample state the full line and reply goes that way ? – NeronLeVelu Mar 07 '14 at 14:00
  • If "$line" contained the text ".*", what would you want to happen? What if "$line" was set to "her" and you hit a line that contained "there" - is that a matching line or not? Do you want to delete the lines that contain the pattern or onlythe lines that exactly match the pattern or do you want to delete from the pattern to the end of of the line or something else? Please post some representative sample input and expected output showing these and other edge cases that you will care about. – Ed Morton Mar 07 '14 at 19:05
  • If "$line" is set to "her", this would only match items marked as "{her}", not anything like "there", or "{there}", but "t{her}e" would be regarded as a match. Note that "{" and "}" are marks I've placed in the text previously to make it easier for this script to find only the correct items and to ignore other text. In other words, it does not need to know what a word boundary is. – Village Mar 08 '14 at 01:54
  • It should never do anything to the line containing the first occurrence, even if the second occurrence of the matching pattern is in that same first line. It should only delete those later lines also having appearances. – Village Mar 08 '14 at 01:56
  • ".*" does not appear anywhere in my input file, and I don't know which choice is best or what kind of answer would benefit others. – Village Mar 08 '14 at 01:57

5 Answers5

3

This might work for you (GNU sed):

sed '/{'"$var"'}/{x;//{x;d};x;h}' file

Use $var as switch held in the hold space.

potong
  • 55,640
  • 6
  • 51
  • 83
2

I should have tested this more. potong pointed out correctly in the comments that

This will skip over every $line and delete everything else.


According to http://www.grymoire.com/Unix/Sed.html#uh-35a, you could do:

sed -i '
    /{$line}/,$ {
        /{$line}/n # skip over the line that has "{$line}" on it
        d
    }
' ./file.txt
davir
  • 912
  • 5
  • 7
2

You need to escape the $d, otherwise it gets treated as a shell variable and expanded by the shell (possibly to an empty string):

sed -i "/{$line}/{2,\$d}" ./file.txt
Adrian Frühwirth
  • 42,970
  • 10
  • 60
  • 71
  • I tried running this, but found it seems to be ignoring the `{` and `}` and just erasing any lines with `$line`. – Village Mar 07 '14 at 13:39
2

Alternative using awk:

$ awk "/$line/&&c++ {next} 1" ./file.txt
Josh Jolly
  • 11,258
  • 2
  • 39
  • 55
  • 2
    You should use variable with `awk` like this: `awk -v var="$line" '$0~var&&c++ {next} 1' ./file.txt` – Jotne Mar 07 '14 at 13:54
2

Using GNU sed:

sed -ne "/$line/{2,\$p}" -e "/$line/! {p}" file
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • I don't think this does what you think it does. To me it says `if a line contains $line and is between 2 and the end-of-file, print it. Otherwise if it does not contain $line print it.` The OP wants the first occurence of $line printed and the rest supressed. – potong Mar 08 '14 at 09:24