With the Windows line ending, you want to remove the ^M (or \r
or carriage return), but you want to replace the ^K with newline, it would seem.
The command I'd use is tr
, twice.
tr -d '\r' < article_filemakerExport.xml | tr '\13' '\12' > tmp.$$ &&
mv tmp.$$ article_filemakerExport.xml || rm -f tmp.$$
Given that one operation is delete and the other substitute, I don't think you can combine those into a single tr
invocation. You can use cp tmp.$$ article_filemakerExport.xml; rm -f tmp.$$
if you're worried about links, etc.
You could also use dos2unix
to convert the CRLF to NL line endings instead of tr
.
Note that tr
is a pure filter; it only reads standard input and only writes to standard output. It does not read or write files directly.
Actually, I need to replace both of these with a newline.
That's easier: a single invocation of tr
will do the job:
tr '\13\15' '\12\12' < article_filemakerExport.xml > tmp.$$ &&
mv tmp.$$ article_filemakerExport.xml || rm -f tmp.$$
Or, if you prefer:
tr '\13\r' '\n\n' < article_filemakerExport.xml > tmp.$$ &&
mv tmp.$$ article_filemakerExport.xml || rm -f tmp.$$
I don't think there's a \z
-style notation for control-K, but I'm willing to learn otherwise (it might be vertical tab, \v
).
(Added the &&
and || rm -f tmp.$$
commands at the hinting of Ed Morton.)
Partial list of control characters
C Oct Dec Hex Unicode Name
\a 07 7 07 U+0007 BELL
\b 10 8 08 U+0008 BACKSPACE
\t 11 9 09 U+0009 HORIZONTAL TABULATION
\n 12 10 0A U+000A LINE FEED
\v 13 11 0B U+000B VERTICAL TABULATION
\f 14 12 0C U+000C FORM FEED
\r 15 13 0D U+000D CARRIAGE RETURN
You can find a complete set of these control characters at the Unicode site (http://www.unicode.org/charts/PDF/U0000.pdf). No doubt there are many other possible places to look too.