How to convert Windows end of line in Unix end of line (CR/LF to LF)

Question

I'm a Java developer and I'm using Ubuntu to develop. The project was created in Windows with Eclipse and it's using the Windows-1252 encoding.

To convert to UTF-8 I've used the recode program:

find Web -iname \*.java | xargs recode CP1252...UTF-8

This command gives this error:

recode: Web/src/br/cits/projeto/geral/presentation/GravacaoMessageHelper.java failed: Ambiguous output in step `CR-LF..data

I've searched about it and get the solution in Bash and Windows, Recode: Ambiguous output in step `data..CR-LF' and it says:

Convert line endings from CR/LF to a single LF: Edit the file with Vim, give the command :set ff=unix and save the file. Recode now should run without errors.

Nice, but I've many files to remove the CR/LF character from, and I can't open each to do it. Vi doesn't provide any option to command line for Bash operations.

Can sed be used to do this? How?

`recode` produces this error when trying to recode a file with mixed dos (`\r\n` - CRLF) and unix (`\n` LF) newline coding. Unfortunatelly `fromdos`, formerly a binary, is currently an alias to recode which has this problem. — Tomas, Feb 19 '14 at 09:38
Related: *[How to convert DOS/Windows newline (CRLF) to Unix newline (LF) in a Bash script](https://stackoverflow.com/questions/2613800)* — Peter Mortensen, Apr 08 '21 at 12:19

score 137 · Answer 1 · edited Feb 19 '14 at 09:32

137

There should be a program called dos2unix that will fix line endings for you. If it's not already on your Linux box, it should be available via the package manager.

edited Feb 19 '14 at 09:32

Tomas

57,621
49
238
373

answered Oct 08 '10 at 13:40

cHao

84,970
20
145
172

2

i've instaled tofrodos that provide fromdos command, but the problem persist. fromdos -a GravacaoMessageHelper.java; recode CP1252...UTF-8 GravacaoMessageHelper.java returns: recode: GravacaoMessageHelper.java failed: Ambiguous output in step `CR-LF..data' – MaikoID Oct 08 '10 at 14:02
1

@MaikoID: Then you have bigger problems. recode shouldn't care about line endings anyway, as a CR is just another character to convert. And it doesn't seem to care on my machine. – cHao Oct 08 '10 at 14:42
1

`fromdos` is just an alias to `recode`, and that will produce the error OP mentioned on files with mixed dos (\r\n - CRLF) and unix (\n LF) coding. Only `dos2unix` works universally. – Tomas Feb 19 '14 at 09:32
1

dos2unix is available on OS X via homebrew: "brew install dos2unix" – Joseph Sheedy Oct 19 '16 at 18:33
1

Just to follow up on this, I ran into the same problem and ended up using the following: `find ./ -name "*.java" -exec dos2unix {} +`. – amracel Jun 23 '19 at 13:21
`dos2unix` is not installed by default on [Ubuntu MATE 20.04](https://en.wikipedia.org/wiki/Ubuntu_MATE#Releases) (Focal Fossa). – Peter Mortensen Apr 08 '21 at 12:17

score 107 · Answer 2 · edited Apr 08 '21 at 12:36

107

sed cannot match \n because the trailing newline is removed before the line is put into the pattern space, but it can match \r, so you can convert \r\n (DOS) to \n (Unix) by removing \r:

sed -i 's/\r//g' file

Warning: this will change the original file

However, you cannot change from Unix EOL to DOS or old Mac (\r) by this. More readings here:

How can I replace a newline (\n) using sed?

edited Apr 08 '21 at 12:36

Peter Mortensen

30,738
21
105
131

answered Oct 09 '13 at 21:51

Jichao

1,753
1
14
11

4

+1 This is a nice solution! But you should note that **`sed -i` will change the original file**! Because people wouldn't expect `sed` to behave so, so warning is appropriate here. Not many people know `-i` so they will try `sed -i ... file > file2` and don't expect the original file to be modified. – Tomas Feb 19 '14 at 09:52
Not all `sed` variants recognize the nonstandard symbolic sequence `\r`. Try with a literal ctrl-M character in that case (in many shells, type ctrl-V ctrl-M to produce the literal control character). – tripleee Aug 29 '20 at 13:50
Nice solution for me, it works on my .ksh files. – user3437460 Jul 30 '21 at 11:41
Is this safe to use on linux files as well? So if you are unsure, you can just run it over without checking first? – Natan Aug 30 '22 at 14:11

score 17 · Answer 3 · edited Apr 08 '21 at 12:39

17

Actually, Vim does allow what you're looking for. Enter Vim, and type the following commands:

:args **/*.java
:argdo set ff=unix | update | next

The first of these commands sets the argument list to every file matching **/*.java, which is all Java files, recursively. The second of these commands does the following to each file in the argument list, in turn:

Sets the line-endings to Unix style (you already know this)
Writes the file out iff it's been changed
Proceeds to the next file

edited Apr 08 '21 at 12:39

Peter Mortensen

30,738
21
105
131

answered Aug 19 '14 at 13:59

Arandur

727
6
19

This is probably much slower than using `dos2unix` in a for-loop, but it's still nice to know how to do it in Vim! – jpaugh Aug 04 '15 at 03:45
2

I ::heart:: my vim. Thank you for this. – Jono Feb 03 '16 at 19:01

John Chesshir · Answer 4 · 2021-07-06T18:55:41.240

12

I'll take a little exception to jichao's answer. You can actually do everything he just talked about fairly easily. Instead of looking for a \n, just look for carriage return at the end of the line.

sed -i 's/\r$//' "${FILE_NAME}"

To change from Unix back to DOS, simply look for the last character on the line and add a form feed to it. (I'll add -r to make this easier with grep regular expressions.)

sed -ri 's/(.)$/\1\r/' "${FILE_NAME}"

Theoretically, the file could be changed to Mac style by adding code to the last example that also appends the next line of input to the first line until all lines have been processed. I won't try to make that example here, though.

Warning: -i changes the actual file. If you want a backup to be made, add a string of characters after -i. This will move the existing file to a file with the same name with your characters added to the end.

Update: The Unix to DOS conversion can be simplified and made more efficient by not bothering to look for the last character. This also allows us to not require using -r for it to work:

sed -i 's/$/\r/' "${FILE_NAME}"

edited Jul 06 '21 at 18:55

answered May 26 '17 at 20:51

John Chesshir

590
5
20

2

I like your suggestion, but its just missing a closing single quote. It should be: sed -ri 's/(.)$/\1\r/' ${FILE_NAME} – mgouin Jul 18 '18 at 22:45
2

@mgouin Thanks for noting that. I've added the missing single quote. – John Chesshir Aug 24 '18 at 16:34
1

For converting LF to CRLF, capturing some last character preceding end of line isn't required and might have impact on performance, as well. In my case it is sufficient to do `sed -i 's/$/\r/' ${FILE_NAME}` ... – Thomas Urban Jul 13 '20 at 21:16
The `-r` option is not portable; if your `sed` doesn't have it, maybe try `-E`. – tripleee Aug 29 '20 at 13:55
@ThomasUrban Thank you for that info. I've added an update with the simplification to allow people to see it sooner. I'm leaving the original expression, though, so that people who read your comment don't get confused reading your statement. – John Chesshir Jul 06 '21 at 18:58

score 9 · Answer 5 · edited Aug 29 '20 at 13:52

9

The tr command can also do this:

tr -d '\15\32' < winfile.txt > unixfile.txt

and should be available to you.

You'll need to run tr from within a script, since it cannot work with file names. For example, create a file myscript.sh:

#!/bin/bash

for f in `find -iname \*.java`; do
    echo "$f"
    tr -d '\15\32' < "$f" > "$f.tr"
    mv "$f.tr" "$f"
    recode CP1252...UTF-8 "$f"
done

Running myscript.sh would process all the java files in the current directory and its subdirectories.

edited Aug 29 '20 at 13:52

tripleee

175,061
34
275
318

answered Oct 08 '10 at 13:44

KeithL

1,150
10
12

how can I adapt to find Web -iname \*.java | xargs recode CP1252...UTF-8 – MaikoID Oct 08 '10 at 13:53
You would need to run tr within a bash script, since it can't work on file names. I'll edit my answer with a sample script. – KeithL Oct 08 '10 at 14:49
Thnx for the answer but the error persists =| Ambiguous output in step `CR-LF..data' – MaikoID Oct 08 '10 at 16:49

V_V · Answer 6 · 2022-08-05T08:23:49.030

6

In order to overcome

Ambiguous output in step `CR-LF..data'

the simple solution might be to add the -f flag to force the conversion.

edited Aug 05 '22 at 08:23

answered May 16 '12 at 13:29

V_V

612
9
23

herein lyeth the anser – caduceus Aug 03 '22 at 06:23

score 0 · Answer 7 · edited Apr 08 '21 at 12:21

Try the Python script by Bryan Maupin found here (I've modified it a little bit to be more generic):

#!/usr/bin/env python

import sys

input_file_name = sys.argv[1]
output_file_name = sys.argv[2]

input_file = open(input_file_name)
output_file = open(output_file_name, 'w')

line_number = 0

for input_line in input_file:
    line_number += 1
    try:  # first try to decode it using cp1252 (Windows, Western Europe)
        output_line = input_line.decode('cp1252').encode('utf8')
    except UnicodeDecodeError, error:  # if there's an error
        sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error))  # write to stderr
        try:  # then if that fails, try to decode using latin1 (ISO 8859-1)
            output_line = input_line.decode('latin1').encode('utf8')
        except UnicodeDecodeError, error:  # if there's an error
            sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error))  # write to stderr
            sys.exit(1)  # and just keep going
    output_file.write(output_line)

input_file.close()
output_file.close()

You can use that script with

$ ./cp1252_utf8.py file_cp1252.sql file_utf8.sql

score -1 · Answer 8 · answered Oct 08 '10 at 14:10

-1

Go back to Windows, tell Eclipse to change the encoding to UTF-8, then back to Unix and run d2u on the files.

answered Oct 08 '10 at 14:10

Jonathan

13,354
4
36
32

1

Although if there's a lot of files, this may be more work than you're willing to put into it... – Jonathan Oct 08 '10 at 14:11
What is d2u and where to find it? – Jesper Rønn-Jensen Sep 29 '11 at 10:37
It gets renamed occasionally. It looks like Ubuntu calls it `fromdos` in 10.04, and it's part of the package `tofrodos`. – Jonathan Nov 21 '11 at 23:02

How to convert Windows end of line in Unix end of line (CR/LF to LF)

8 Answers8

Linked

Related