0

What is the best way to call following unix command in Python? cat file1.txt | tr -d '\r' > file2.txt I tried following cases:
1.

cmd = "cat file1 | tr -d \'\r\'> file2"
args = shlex.split(cmd)
p = subprocess.Popen(args, shell=True)

I got cat: stdin: Input/output error

2.

f = open(file2, "w")
p = subprocess.call(args, stdout=f)

I got:

cat: |: No such file or directory
cat: tr: No such file or directory
cat: -d: No such file or directory
cat: \r: No such file or directory

3.

p = subprocess.Popen(args, stdout=subprocess.PIPE)
(out,err) = p.communicate()
print(out)

It works, but I do not know why when I use file.write(out) instead of print(out), I get the same error as case 2.

Shannon
  • 985
  • 3
  • 11
  • 25
  • 1
    I might get told to answer the question as asked, but: there's no reason to use cat and tr from Python to remove carriage returns. Open the file, read the data, and write it out transformed. – Ned Batchelder Jan 24 '18 at 02:55
  • @Ned Batchelder, the file may be pretty big, so I do not want to open and edit it in python. – Shannon Jan 24 '18 at 02:56
  • @Shabnam you don't have to have all the data in memory. Your Python program can do just what tr does. – Ned Batchelder Jan 24 '18 at 02:57

1 Answers1

1

Just do it in Python:

with open("file1.txt", "rb") as fin:
    with open("file2.txt", "wb") as fout:
        while True:
            data = fin.read(100000)
            if not data:
                break
            data = data.replace(b"\r", b"")
            fout.write(data)
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • Would doing `fin.read(fin._CHUNK_SIZE)` be beneficial here? Ever since I noticed files have a `_CHUNK_SIZE` attribute, I've had a hunch that this value might be optimized for optimal disk read operations but never really could mentally justify it since it means more iterations in python-land. – codykochmann Jan 24 '18 at 03:53
  • I've never header of _CHUNK_SIZE, and don't know how it would affect things. I expect this would be limited by the I/O speed anyway, but you'd have to time it if it really matters. – Ned Batchelder Jan 24 '18 at 12:25