13

I can't find a suitable sed expression to remove a word followed by a line return (\n)

Test file is:

line1\n
line2\n
line3mark\n
line4\n
line5\n

and i want to remove all occurances of mark\n leaving, in this case:

line1\n
line2\n
line3line4\n
line5\n

have searched and can use:

sed 's/\n//g' test.file  to remove ALL \n's

but

sed 's/mark\n//g' test.file does not work

Strangely, s/mark\n//g does seem to work ok in vi in interactive mode.

Any help greatly appreciated! I would like to understand how to do it using SED if possible as I am sure it is possible!! However, if it can be done another way then I'm also happy as long as its on the command line as it has to run over many files.

Many thanks.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
user1048271
  • 141
  • 1
  • 4

8 Answers8

9

This should do the trick:

sed -i ':a;N;$!ba;s/mark\n//g' file

Explanation:

;    command separator within sed
:a   a label, like in C/C++
N    appends the next line to the pattern space
$!ba repeats the N command for all lines but the last line

sed proceeds like this. it reads the standard input into the pattern space, performs a sequence of editing commands on the pattern space, then writes the pattern space to STDOUT.

When you do something like

sed -i 's/mark\n//' file

lines are copied to the pattern space one by one.

:a;N;$!ba appends each line to the pattern space.

Then the pattern space can be processed in one pass, removing any mark\n , the g option, for global, is important here because it ask sed not to stop at the first matching pattern.

pevik
  • 4,523
  • 3
  • 33
  • 44
log0
  • 10,489
  • 4
  • 28
  • 62
  • 1
    Nice solution. It does force sed to read the whole file, though, doesn't it? – thiton Nov 15 '11 at 19:30
  • Ugo, can you confirm that essentially what you're doing is reading in the entire input stream into the processing buffer then doing the search and replace on the entire contents? – dj_segfault Nov 15 '11 at 19:33
  • @thiton/dj_segfault yes yes first step is appending everything to the pattern space. – log0 Nov 15 '11 at 20:08
  • Thanks - works perfectly! Now i just need to work out exactly why the parts you've added do the job! :-) – user1048271 Nov 15 '11 at 20:28
6

For real line feeds, use:

sed -e ':a; /mark$/ { N; s/mark\n//; ba; }'

All lines that end with mark are joined with the next and the now middle \n is removed.

If there is a literal string \n at the end of the line, you need to escape the \ as \\n.

thiton
  • 35,651
  • 4
  • 70
  • 100
  • 1
    @thinton I don't think it works if two successive lines contain the mark. It is certainly possible to get it work without putting the whole file in memory though... – log0 Nov 15 '11 at 21:47
  • @Ugo: Fair enough. Needs to be looped to consider this case. Edited accordingly. – thiton Nov 16 '11 at 10:02
2

I saw awk tag, so here we go.

If \n is a 'line return', awk can join a line ending with a 'mark' with the next line.

$> awk '/mark$/ { sub(/mark$/,""); getline t; print $0 t; next }; 1' ./text 
line1
line2
line3line4
line5
  • 1
    Thanks! Yes, I was interested to see how it could be done in other ways too and thought awk might do it. Thank you - very useful to understand this. – user1048271 Nov 15 '11 at 20:30
  • I'm not sure about that, but comments it's not such a good place for some ecstatic emotions. – ДМИТРИЙ МАЛИКОВ Nov 15 '11 at 20:36
  • @dmitry.malikov I am learning awk so can you please explain your one-liner for +1? :) – jaypal singh Nov 15 '11 at 21:41
  • 1
    @DawnoftheDead: for lines that end in `mark`, it removes `mark`, loads the next line of input in `t`, prints the current line and the next line in `t`, and begins the next cycle. For every other line, `1` is a golfed `{print}`. – ninjalj Nov 15 '11 at 22:34
  • 2
    @dmitry.malikov: if you are looking for a short solution: `'!/mark$/{print}; /mark$/{sub(/mark$/,_); printf $0}'` is short and still readable. – ninjalj Nov 15 '11 at 22:40
  • @dmitry.malikov: and yes, I know the first `{print};` can be substituted by a single semicolon. – ninjalj Nov 15 '11 at 22:41
  • @glennjackman: good catch. (my solution works for that case). – ninjalj Nov 16 '11 at 21:31
1

If you can use awk, you can do

awk '
    /mark$/ {sub(/mark$/, ""); hold = hold $0; next}
    {print hold $0; hold = ""}
    END {if (hold) print hold}
'
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

there are already many answers, sed and awk.

I am adding another one, with awk, just show that awk can do it in a shorter command:

awk 'gsub(/mark$/,""){printf $0;next;}1' input

test:

kent$  echo "line1
line2
line3mark
line4
line5"|awk 'gsub(/mark$/,""){printf $0;next;}1'

output:

line1
line2
line3line4
line5

don't know if this is really OP wanted.

Kent
  • 189,393
  • 32
  • 233
  • 301
0
awk '{sub(/line4/,"line3line4")}!/mark/' file

line1\n
line2\n
line3line4\n
line5\n
Claes Wikner
  • 1,457
  • 1
  • 9
  • 8
0

Just use the following command

sed -e "{:q;N;s/mark\n//g;t q}" test.file
0

This might work for you:

sed -e '1{h;d};H;${x;s/mark\n//g;p};d' test.file
potong
  • 55,640
  • 6
  • 51
  • 83