0

I am writing a Python script that will take STDIN from TextWrangler and do something to it on a line by line basis. In Textwrangler, I combine multiple text files using drag and drop. Problem is that the documents retain the ^Z (0x1A) character, which my Python script is interpreting as a EOF indicator. The result is that my script only "sees" the first of the many combined text documents (up to the first EOF character).

I've researched and read about reading in binary modes, buffers and such, but I'm a complete newbie to this kind of stuff and can't figure out how to implement any of those ideas. It seems that readlines() looks for the EOF and stops. How can I prevent that?

Here is my code:

import sys

for line_number, line in enumerate(sys.stdin.readlines()):
    if len(line) > 4:  # Blank lines are skipped
        if line.split()[0].isdigit():  #Determine if the line begins with an EVENT NUMBER
            print line.split()[7]
  • 1
    possible duplicate of [Reading binary data from stdin](http://stackoverflow.com/questions/2850893/reading-binary-data-from-stdin) – falsetru Jan 22 '14 at 04:30
  • 2
    I'm pretty sure this isn't Python's doing, it's Windows (or, rather, the stdio part of the MSVCRT library) being Windows. Reading stdin in binary mode avoids this problem. If you don't want to do that, you can migrate to Python 3, which doesn't have this problem (since it doesn't directly use stdio the same way Python 2 does). But you're going to have to learn something new to do what you want; there's no way around that. – abarnert Jan 22 '14 at 04:45
  • TextWrangler is a MacOS software package. – WombatPM Jan 22 '14 at 04:56
  • I can't believe in this day and age that there's *any* software that still writes a 0x1A to the end of a file. Not that I doubt you of course, I'm just appalled. – Mark Ransom Jan 22 '14 at 05:07
  • As WombatPM points out, TextWrangler is a MacOS application. And yes, the EDL, created in the 70's, is a text file that is used to finish most every feature film and television program you have seen. – Paul Carlin Jan 29 '14 at 02:45

1 Answers1

0

Option 1: Since you are generating your source files external to python, just add a step after TextWrangler to remove the offending characters. I've become a big fan of sed and grep. Ports are available for windows, and natively available for *nix.

Option 2: Fix the file in TextWrangler.

Option 3: Convert the Textwrangler steps to a python script and avoid the issue altogether.

WombatPM
  • 2,561
  • 2
  • 22
  • 22
  • I would prefer a solution that allows the EOF to be ignored, but it appears that using TextWrangler and a "Text Filter" isn't going to work. I need the speed and ease of drag and drop combining using TextWrangler. The best solution at this point is to "Zap Gremlins" prior to applying the Text Filter. – Paul Carlin Jan 29 '14 at 02:57