4

I've tried searching the web, and a number of different things I've read on the web, but don't seem to get the desired result.

I'm using Windows 7 and Python 3.6.

I'm connecting to an Oracle db with cx_oracle and creating a text file with the query results. The file that is created (which I'll call my_file.txt to make it easy) has 3688 lines in it all with CRLF which needs to be converted to the unix LF.

If I run python crlf.py my_file.txt it is all converted correctly & there is no issues, but that means I need to run another command manually which I do not want to do.

So I tried adding the code below to my file.

filename = "NameOfFileToBeConverted"
fileContents = open(filename,"r").read()
f = open(filename,"w", newline="\n")
f.write(fileContents)
f.close()

This does convert the majority of the CRLF to LF but @ line 3501 it has a NUL character 3500 times on the one line followed by a row of data from the database & it ends with the CRLF, every line from here on still has the CRLF.

So with that not working, I removed it and then tried

import subprocess
subprocess.Popen("crlf.py "+ filename, shell=True)

I also tried using

import os
os.system("crlf.py "+ filename)

The "+ filename" in the two examples above is just providing the filename that is created during the data extract.

I don't know what else to try from here.

halfer
  • 19,824
  • 17
  • 99
  • 186
torz
  • 41
  • 1
  • 3
  • Why not use the correct newline when generating the file in the first place? – Ignacio Vazquez-Abrams Nov 08 '17 at 11:27
  • This is what I have used when writing the file `csv.writer(open("D:/Users/username/Desktop/EXTRACT_INPUT_"+dt_today+".txt","w"), delimiter=",", lineterminator="\n")` the \n is the correct newline isn't it? – torz Nov 08 '17 at 11:54

1 Answers1

6

Convert Line Endings in-place (with Python 3)

Windows to Linux/Unix

Here is a short script for directly converting Windows line endings (\r\n also called CRLF) to Linux/Unix line endings (\n also called LF) in-place (without creating an extra output file):

# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'

# relative or absolute file path, e.g.:
file_path = r"c:\Users\Username\Desktop\file.txt"

with open(file_path, 'rb') as open_file:
    content = open_file.read()

content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)

with open(file_path, 'wb') as open_file:
    open_file.write(content)

Linux/Unix to Windows

Just swap the line endings to content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING).


Code Explanation

  • Important: Binary Mode We need to make sure that we open the file both times in binary mode (mode='rb' and mode='wb') for the conversion to work.

    When opening files in text mode (mode='r' or mode='w' without b), the platform's native line endings (\r\n on Windows and \r on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n. So the call to content.replace() couldn't find any line endings to replace.

    In binary mode, no such conversion is done.

  • Binary Strings In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode - therefore we need to add b in front of our replacement strings to tell Python to handle those strings as binary, too.

  • Raw Strings On Windows the path separator is a backslash \ which we would need to escape in a normal Python string with \\. By adding r in front of the string we create a so called raw string which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer.

  • Alternative We open the file twice to avoid the need of repositioning the file pointer. We also could have opened the file once with mode='rb+' but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).

    Simply opening the file again in write mode does that automatically for us.

Cheers and happy programming,
winklerrr

winklerrr
  • 13,026
  • 8
  • 71
  • 88
  • I was having so many issues working with windows based files and the line feeds and such, I was baffled until I came across your post. Until I found your answer I wanted to pull out my hair. – Breadtruck Mar 12 '23 at 23:24