37

So I got those template, they are all ending in LF and I can fill some terms inside with format and still get LF files by opening with "wb".

Those templates are used in a deployment script on a windows machine to deploy on a unix server.

Problem is, a lot of people are going to mess with those template, and I'm 100% sure that some of them will put some CRLF inside.

How could I, using Python, convert all the CRLF to LF?

Neuron
  • 5,141
  • 5
  • 38
  • 59
Heetola
  • 5,791
  • 7
  • 30
  • 45

4 Answers4

74

Convert line endings in-place (with Python 3)

Line endings:

  • Windows - \r\n, called CRLF
  • Linux/Unix/MacOS - \n, called LF

Windows to Linux/Unix/MacOS (CRLFLF)

Here is a short Python script for directly converting Windows line endings to Linux/Unix/MacOS line endings. The script works in-place, i.e., without creating an extra output file.

# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'

# relative or absolute file path, e.g.:
file_path = r"c:\Users\Username\Desktop\file.txt"

with open(file_path, 'rb') as open_file:
    content = open_file.read()
    
# Windows ➡ Unix
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)

# Unix ➡ Windows
# content = content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING)

with open(file_path, 'wb') as open_file:
    open_file.write(content)

Linux/Unix/MacOS to Windows (LFCRLF)

To change the converting from Linux/Unix/MacOS to Windows, simply comment the replacement for Unix ➡ Windows back in (remove the # in front of the line).

DO NOT comment out the command for the Windows ➡ Unix replacement, as it ensures a correct conversion. When converting from LF to CRLF, it is important that there are no CRLF line endings already present in the file. Otherwise, those lines would be converted to CRCRLF. Converting lines from CRLF to LF first and then doing the aspired conversion from LF to CRLF will avoid this issue (thanks @neuralmer for pointing that out).


Code Explanation

Binary Mode

Important: We need to make sure that we open the file both times in binary mode (mode='rb' and mode='wb') for the conversion to work.

When opening files in text mode (mode='r' or mode='w' without b), the platform's native line endings (\r\n on Windows and \r on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n. So the call to content.replace() couldn't find any \r\n line endings to replace.

In binary mode, no such conversion is done. Therefore the call to str.replace() can do its work.

Binary Strings

In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode - therefore we need to add b in front of our replacement strings to tell Python to handle those strings as binary, too.

Raw Strings

On Windows the path separator is a backslash \ which we would need to escape in a normal Python string with \\. By adding r in front of the string we create a so called "raw string" which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer into your script.

(Hint: Inside Windows Explorer press CTRL+L to automatically select the path from the address bar.)

Alternative solution

We open the file twice to avoid the need of repositioning the file pointer. We could also have opened the file once with mode='rb+' but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).

Simply opening the file again in write mode does that automatically for us.

Cheers and happy programming,
winklerrr

user202729
  • 3,358
  • 3
  • 25
  • 36
winklerrr
  • 13,026
  • 8
  • 71
  • 88
  • 1
    I don't get my `content` from a file, and hence can't read in binary mode. Basically I have a 'multiline string'. Hence, I get `TypeError: replace() argument 1 must be str, not bytes`. Is there a solution for that? – AstroFloyd Nov 09 '19 at 10:31
  • 1
    @AstroFloyd you first need to convert your string to bytes: `byte_str = your_str.encode("UTF-8")`. Then replace the line endings for `byte_str`. To convert it back to a string use: `your_new_str = byte_str.decode("UTF-8")`. – winklerrr Nov 12 '19 at 07:51
  • Thanks, I open my file with 'w' in windows, but it is converted to crlf. So this is not true for me: `opening files in text mode (mode='r' or mode='w' without b), the platform's native line endings (\r\n on Windows and \r on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n` – Timo Nov 25 '20 at 14:05
  • @Timo according to the official [documentation](https://docs.python.org/3/library/functions.html#open) for the open function: "There is an additional mode character permitted, 'U', which no longer has any effect, and is considered deprecated. It previously enabled universal newlines in text mode, which became the default behaviour in Python 3.0. Refer to the documentation of the newline parameter for further details. Note Python doesn’t depend on the underlying operating system’s notion of text files; all the processing is done by Python itself, and is therefore platform-independent." – winklerrr Nov 29 '20 at 14:24
  • 1
    For converting LF to CRLF line-endings, it is important to know that there are not any lines that already end in CRLF, otherwise you may end up with some lines that "end with" CR CR LF. Converting lines from CRLF to LF first and then doing the conversion from LF to CRLF will avoid this issue. – neuralmer Feb 15 '22 at 21:52
  • @neuralmer thanks! I've updated my answer accordingly. – winklerrr Feb 16 '22 at 11:00
  • I recommend binary mode for `content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)` because it doesn't change encoding. – Constantin Hong Jul 08 '23 at 16:30
27

Python 3:

The default newline type for open is universal, in which case it doesn't mind which sort of newline each line has. You can also request a specific form of newline with the newline argument for open.

Translating from one form to the other is thus rather simple in Python:

with open('filename.in', 'r') as infile, \
     open('filename.out', 'w', newline='\n') as outfile:
    outfile.writelines(infile.readlines())

Python 2:

The open function supports universal newlines via the 'rU' mode.

Again, translating from one form to the other:

with open('filename.in', 'rU') as infile, \
     open('filename.out', 'w', newline='\n') as outfile:
    outfile.writelines(infile.readlines())

(In Python 3, mode U is actually deprecated; the equivalent form is newline=None, which is the default)

Neuron
  • 5,141
  • 5
  • 38
  • 59
Yann Vernier
  • 15,414
  • 2
  • 28
  • 26
  • Can it write to the same file this way? – mercury Apr 08 '22 at 05:30
  • There's a considerable risk of data loss involved in rewriting the data to the same file, as you'll have time the file is only partially written. The `'w'` mode truncates the file, effectively deleting all its content, so you wouldn't want to open it both ways simultaneously either. You could open in `'r+'` mode, and do `file.seek(0)` between the read and write steps, then finally `file.truncate()` to remove excess data at the end, but it still risks corruption while the data isn't complete. Oh, and the `newline` change means we do need a second `open`, not just a `seek`, or a `reconfigure`. – Yann Vernier Apr 13 '22 at 11:22
1

Why don't you try below:

str.replace('\r\n','\n');

CRLF => \r\n
LF => \n

Neuron
  • 5,141
  • 5
  • 38
  • 59
  • 2
    It won't work just like this, as python changes \n to current system default line ending (which is CRLF for Windows). So you need use binary mode (no prevent python from any changes) or use `newline` parameter as shown in upvoted answers. – Leonid Mednikov May 14 '20 at 06:58
  • In addition to @LeonidMednikov, I recommend binary mode because it doesn't change encoding. – Constantin Hong Jul 08 '23 at 16:28
0

It is possible to fix existing templates with messed-up ending with this code:

with open('file.tpl') as template:
   lines = [line.replace('\r\n', '\n') for line in template]
with open('file.tpl', 'w') as template:
   template.writelines(lines)
apr
  • 370
  • 1
  • 5