1

I'm reformatting a file, and I want to perform the following steps:

  1. Replace double CRLF's with a temporary character sequence ($CRLF$ or something)
  2. Remove all CRLF's in the whole file
  3. Go back and replace the double CRLF's.

So input like this:

This is a paragraph
of text that has
been manually fitted
into a certain colum
width.

This is another
paragraph of text
that is the same.

Will become

This is a paragraph of text that has been manually fitted into a certain colum width.

This is another paragraph of text that is the same.

It seems this should be possible by piping the input through a few simple sed programs, but I'm not sure how to refer to CRLF in sed (to use in sed 's/<CRLF><CRLF>/$CRLF$/'). Or maybe there's a better way of doing this?

fredley
  • 32,953
  • 42
  • 145
  • 236
  • 1
    `s//` operates on lines. You cannot change `\r\n` with it. See [this question](http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n). – Piotr Praszmo Jul 09 '12 at 10:55
  • @Banthar Thanks, however it seems you can still use `sed`, in the method described in the answer? – fredley Jul 09 '12 at 10:56
  • Are they actually carriage returns and line-feeds (Windows-style) or are they Unix-style newlines? – Dennis Williamson Jul 09 '12 at 10:56
  • They are using sed to combine the lines together. If you can use perl try [this answer](http://stackoverflow.com/a/1252020/745924). You should be able to do the whole thing in a single run. – Piotr Praszmo Jul 09 '12 at 10:59
  • @DennisWilliamson Windows-style – fredley Jul 09 '12 at 11:00

5 Answers5

1

You can use sed to decorate all rows with a {CRLF} at end:

sed 's/$/<CRLF>/'

then remove all \r\n with tr

| tr -d "\r\n"

and then replace double CRLF's with \n

| sed 's/<CRLF><CRLF>/\n/g'

and remove leftover CRLF's.

There was an one-liner sed which did all this in a single cycle, but I can't seem to find it now.

tripleee
  • 175,061
  • 34
  • 275
  • 318
LSerni
  • 55,617
  • 10
  • 65
  • 107
  • This works exactly as I require, here's the final program I used: `sed -e 's/$//' $* | tr -d "\r\n" | sed 's//\n\n/g' | sed 's/[ \t]*/ /g'` – fredley Jul 09 '12 at 11:15
0

Try the below:

cat file.txt | sed 's/$/ /;s/^ *$/CRLF/' | tr -d '\r\n' | sed 's/CRLF/\r\n'/

That's not quite the method you've given; what this does is the below:

  1. Add a space to the end of each line.
  2. Replace any line that contains only whitespace (ie blank lines) with "CRLF".
  3. Deletes any line-breaking characters (both CR and LF).
  4. Replaces any occurrences of the string "CRLF" with a Windows-style line break.

This works on Cygwin bash for me.

me_and
  • 15,158
  • 7
  • 59
  • 96
0

Redefine the Problem

It looks like what you're really trying to do is reflow your paragraphs and single-space your lines. There are a number of ways you can do this.

A Non-Sed Solution

If you don't mind using some packages outside coreutils, you could use some additional shell utilities to make this as easy as:

dos2unix /tmp/foo
fmt -w0 /tmp/foo | cat --squeeze-blank | sponge /tmp/foo
unix2dos /tmp/foo

Sponge is from the moreutils package, and will allow you to write the same file you're reading. The dos2unix (or alternatively the tofrodos) package will allow to convert your line endings back and forth for easier integration with tools that expect Unix-style line endings.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
0

This might work for you (GNU sed):

sed ':a;$!{N;/\n$/{p;d};s/\r\?\n/ /;ba}' file
potong
  • 55,640
  • 6
  • 51
  • 83
0

Am I missing why this is not easier?

Add CRLF:

sed -e s/\s+$/$'\r\n'/ < index.html > index_CRLF.html

remove CRLF... go unix:

sed -e s/\s+$/$'\n'/ < index_CRLF.html > index.html

Drew Deal
  • 51
  • 4