0

I'm trying to do a subprocess call from my python script that replaces the carriage return and newline characters in a file with a space, and then saves it back to file itself. I have verified that this works:

cat file.txt | tr '\r\n' ' ' > file.txt

and so tried to do the same thing in python. My call looks like this:

formatCommand = "cat " + fileName + " | tr '\\r\\n' ' ' > " + fileName
print(formatCommand)    #this showed me that the command above is being passed
subprocess.call(formatCommand, shell=True)

Rather than successfully delete the newlines like I expect it to, the file ends up being empty.

I consulted this post about a similar problem, but the solution was to use shell=True which I already employ, and the redirect makes the Popen more complicated. Furthermore, I don't see why it doesn't work with the shell=True.

Community
  • 1
  • 1
taronish4
  • 504
  • 3
  • 5
  • 13
  • Why are you using this in the first place? What you're trying to do in a shell pipeline is just as easy in Python… – abarnert Jul 30 '14 at 19:00
  • 2
    This actually results in an empty file for me when I do it directly from a bash shell. – dano Jul 30 '14 at 19:02
  • You're right... I could totally just do a replace call. However, now I'm just curious as to why it doesn't work, so I'll keep the question up. – taronish4 Jul 30 '14 at 19:02
  • @taronish4: dano is right. The problem is that your `tr` invocation is wrong, both from the shell and from Python; it has nothing to do with your use of `subprocess`. – abarnert Jul 30 '14 at 19:02
  • Also, as a side note: You should look into using either `str.format` or `%` rather than concatenation, especially when you have quotes like this all of the place. And raw strings might help too. For example, isn't this more readable (and more obviously right)? `r"cat {} | tr '\r\n' ' ' > {}".format(fileName, fileName)` – abarnert Jul 30 '14 at 19:04
  • Redirections happen very early and the file gets clobbered before anything reads it. Try `cat file.txt > file.txt` for example. With a pipeline I believe you *might* get lucky and win the race between the sub-shells but I don't know that's possible and certainly isn't guaranteed. So don't do that. – Etan Reisner Jul 30 '14 at 19:06
  • Finally, there's almost never a good reason to pipe from `cat`. Instead of `cat spam | foo`, just do `foo < spam`. And if you were doing that to eliminate the race caused by using `file.txt` as both input and output, that's specifically a very bad reason. You aren't actually eliminating it, just slightly narrowing it so that you might not notice that it's there in a quick test. – abarnert Jul 30 '14 at 19:06
  • You could use `sponge` from moreutils package: `< file.txt tr '\r\n' ' ' | sponge file.txt`. Or do it in pure Python: `for line in fileinput.input('file.txt', inplace=1, mode='r+b'): sys.stdout.write(line.translate(maketrans(b'\r\n', b' ')))` – jfs Jul 31 '14 at 16:38

1 Answers1

3

There's a race condition in your shell command. The first command in your pipeline is cat file.txt, the second command is tr '\r\n' ' ' > file.txt. Both commands are run in parallel at the same time. The first command reads from file.txt, the second trunctates file.txt and then writes to it. If the truncation happens before the first command reads from the file then the file will be empty.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
  • note: the redirection `> file.txt` is done by the shell *before* `tr` command is even run. I don't know whether `cat` may ever see non-empty `file.txt` here. – jfs Jul 31 '14 at 16:17
  • 1
    The whole `tr` command, including redirections, is done in parallel to to the `cat` command. So it's possible for the `cat` command to output something, even finish if the file is smaller than the pipe buffer, before the `tr` redirections to occur. For example `echo test > foo.txt; cat foo.txt | tr t T $(sleep 1) > foo.txt; cat foo.txt` will probably print `TesT`. – Ross Ridge Jul 31 '14 at 16:43