SED - removing string followed by LineFeed (\n)

Question

I can't find a suitable sed expression to remove a word followed by a line return (\n)

Test file is:

line1\n
line2\n
line3mark\n
line4\n
line5\n

and i want to remove all occurances of mark\n leaving, in this case:

line1\n
line2\n
line3line4\n
line5\n

have searched and can use:

sed 's/\n//g' test.file  to remove ALL \n's

but

sed 's/mark\n//g' test.file does not work

Strangely, s/mark\n//g does seem to work ok in vi in interactive mode.

Any help greatly appreciated! I would like to understand how to do it using SED if possible as I am sure it is possible!! However, if it can be done another way then I'm also happy as long as its on the command line as it has to run over many files.

Many thanks.

@thiton: doesn't matter, `sed` will append a newline when printing the pattern-space. — ninjalj, Nov 15 '11 at 20:19

score 9 · Answer 1 · edited Apr 25 '18 at 15:13

9

This should do the trick:

sed -i ':a;N;$!ba;s/mark\n//g' file

Explanation:

;    command separator within sed
:a   a label, like in C/C++
N    appends the next line to the pattern space
$!ba repeats the N command for all lines but the last line

sed proceeds like this. it reads the standard input into the pattern space, performs a sequence of editing commands on the pattern space, then writes the pattern space to STDOUT.

When you do something like

sed -i 's/mark\n//' file

lines are copied to the pattern space one by one.

:a;N;$!ba appends each line to the pattern space.

Then the pattern space can be processed in one pass, removing any mark\n , the g option, for global, is important here because it ask sed not to stop at the first matching pattern.

edited Apr 25 '18 at 15:13

pevik

4,523
3
33
44

answered Nov 15 '11 at 19:27

log0

10,489
4
28
62

1

Nice solution. It does force sed to read the whole file, though, doesn't it? – thiton Nov 15 '11 at 19:30
Ugo, can you confirm that essentially what you're doing is reading in the entire input stream into the processing buffer then doing the search and replace on the entire contents? – dj_segfault Nov 15 '11 at 19:33
@thiton/dj_segfault yes yes first step is appending everything to the pattern space. – log0 Nov 15 '11 at 20:08
Thanks - works perfectly! Now i just need to work out exactly why the parts you've added do the job! :-) – user1048271 Nov 15 '11 at 20:28

thiton · Answer 2 · 2011-11-16T10:03:14.377

6

For real line feeds, use:

sed -e ':a; /mark$/ { N; s/mark\n//; ba; }'

All lines that end with mark are joined with the next and the now middle \n is removed.

If there is a literal string \n at the end of the line, you need to escape the \ as \\n.

edited Nov 16 '11 at 10:03

answered Nov 15 '11 at 19:27

thiton

35,651
4
70
100

1

@thinton I don't think it works if two successive lines contain the mark. It is certainly possible to get it work without putting the whole file in memory though... – log0 Nov 15 '11 at 21:47
@Ugo: Fair enough. Needs to be looped to consider this case. Edited accordingly. – thiton Nov 16 '11 at 10:02

score 2 · Answer 3 · answered Nov 15 '11 at 20:11

2

I saw awk tag, so here we go.

If \n is a 'line return', awk can join a line ending with a 'mark' with the next line.

$> awk '/mark$/ { sub(/mark$/,""); getline t; print $0 t; next }; 1' ./text 
line1
line2
line3line4
line5

answered Nov 15 '11 at 20:11

ДМИТРИЙ МАЛИКОВ

21,474
11
78
131

1

Thanks! Yes, I was interested to see how it could be done in other ways too and thought awk might do it. Thank you - very useful to understand this. – user1048271 Nov 15 '11 at 20:30
I'm not sure about that, but comments it's not such a good place for some ecstatic emotions. – ДМИТРИЙ МАЛИКОВ Nov 15 '11 at 20:36
@dmitry.malikov I am learning awk so can you please explain your one-liner for +1? :) – jaypal singh Nov 15 '11 at 21:41
1

@DawnoftheDead: for lines that end in `mark`, it removes `mark`, loads the next line of input in `t`, prints the current line and the next line in `t`, and begins the next cycle. For every other line, `1` is a golfed `{print}`. – ninjalj Nov 15 '11 at 22:34
2

@dmitry.malikov: if you are looking for a short solution: `'!/mark$/{print}; /mark$/{sub(/mark$/,_); printf $0}'` is short and still readable. – ninjalj Nov 15 '11 at 22:40
@dmitry.malikov: and yes, I know the first `{print};` can be substituted by a single semicolon. – ninjalj Nov 15 '11 at 22:41
@glennjackman: good catch. (my solution works for that case). – ninjalj Nov 16 '11 at 21:31

score 1 · Answer 4 · answered Nov 15 '11 at 20:11

1

If you can use awk, you can do

awk '
    /mark$/ {sub(/mark$/, ""); hold = hold $0; next}
    {print hold $0; hold = ""}
    END {if (hold) print hold}
'

answered Nov 15 '11 at 20:11

glenn jackman

238,783
38
220
352

Your solution is kinda overloaded, is all that stuff really important? – ДМИТРИЙ МАЛИКОВ Nov 15 '11 at 20:13
sed is an incredibly terse language. awk is more readable to people comfortable with C-like languages. You can make the decision which is easier for you. And yes, you need all that to implement a solution for this problem. If you like, change the "hold" variable to something shorter. – glenn jackman Nov 15 '11 at 20:18
Okay, but we can use `getline` instead of unnecessary if? Is it unsafe or what? – ДМИТРИЙ МАЛИКОВ Nov 15 '11 at 20:21

score 1 · Answer 5 · answered Nov 16 '11 at 11:34

1

there are already many answers, sed and awk.

I am adding another one, with awk, just show that awk can do it in a shorter command:

awk 'gsub(/mark$/,""){printf $0;next;}1' input

test:

kent$  echo "line1
line2
line3mark
line4
line5"|awk 'gsub(/mark$/,""){printf $0;next;}1'

output:

line1
line2
line3line4
line5

don't know if this is really OP wanted.

answered Nov 16 '11 at 11:34

Kent

189,393
32
233
301

Hey! that's shorter than mine! +1 – ninjalj Nov 16 '11 at 21:37

score 0 · Answer 6 · answered Nov 25 '16 at 22:27

0

awk '{sub(/line4/,"line3line4")}!/mark/' file

line1\n
line2\n
line3line4\n
line5\n

answered Nov 25 '16 at 22:27

Claes Wikner

1,457
1
9
8

score 0 · Answer 7 · 2011-11-15T19:53:21.007

0

Just use the following command

sed -e "{:q;N;s/mark\n//g;t q}" test.file

edited Nov 15 '11 at 19:53

answered Nov 15 '11 at 19:27

Wouldn't that remove line3 completely instead of prepending it to the next line? – thiton Nov 15 '11 at 19:28

potong · Answer 8 · 2011-11-16T08:54:17.857

0

This might work for you:

sed -e '1{h;d};H;${x;s/mark\n//g;p};d' test.file

edited Nov 16 '11 at 08:54

answered Nov 15 '11 at 21:11

potong

55,640
6
51
83

SED - removing string followed by LineFeed (\n)

8 Answers8