2

I have written a script to remove excess spaces from a foreign language text. When I execute the script in Windows command prompt, I receive no errors. Everything looks perfect. However, the output file which I specified in my script is not created nor the input file modified. I tried creating a blank document 'corpus_1' for the script to write to. Then I tried just writing back to the input file. Either way, the specified file remains unmodified. How do I get my script to write to a file? What am I missing in my code?

def lettersWhitespace():

    replacements = {'  ':' ', 'c ':'c'}

    with open('C:\\Users\\Charles\\corpus.odt','w+') as infile, open('C:\\Users\\Charles\\corpus_1.odt', 'w') as outfile:
        for line in infile:
            for src, target in replacements.iteritems():
                line = line.replace(src, target)
            outfile.write(line)

EDIT: I believe that I have found the problem. It appears that my first line, 'def lettersWhitespace():' is redundant. As written, the script is defining a function, but not calling that function. Does this sound correct?

Charles R
  • 49
  • 7
  • `w+` wipes the file. I hope that wasn't your only copy. – user2357112 Feb 14 '17 at 23:19
  • Thanks for the input. I do have several backups, but I'm not even coming up with a wiped file after running the script. Nothing is being modified. However, when I finally get this to work, should I just have 'w' rather than 'w+'? – Charles R Feb 14 '17 at 23:32

1 Answers1

1

Both w and w+ truncate the file. Suppose you have a file containing a, b, c (each in a newline):

with open('testfile.txt', 'w') as f:
    f.write('a\nb\nc')

and you open it in r you can read the file:

with open('testfile.txt', 'r') as f:
    print(f.read())
# a
# b
# c

If you open it in w+ mode it's truncated (empty):

with open('testfile.txt', 'w+') as f:
    print(f.read())
# 

You probably wanted a "non-truncating" read/write mode starting at the file beginning: r+ (or if you want the file handle to be at the end of the file: a+)

with open('testfile.txt', 'r+') as outp, open('testfile.txt', 'r') as inp:
    for line in inp:
        line = line.replace('a', 'b')
        outp.write(line)

which modifies the file as you write:

with open('testfile.txt', 'r') as f:
    print(f.read())
# b
# b
# c

A very handy summary of the file modes can be found in this StackOverflow answer of @And.

Community
  • 1
  • 1
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • Thank you. I am a newbie and had understood 'w' as 'write', 'r' as 'read', 'a' as 'append', and '+' as 'read and write'. It's starting to look like there's a bit more to the picture, so I'm going to dig into some more tutorials on file open commands. For the script in question, does it appear that that is indeed the reason that my source text is not being altered? I was also wondering if python has difficulty working with text which is not in .txt files? (I'm currently using .odt files as my input and output.) – Charles R Feb 15 '17 at 02:56
  • @CharlesR In windows there is a difference between binaries and raw data files, it might be necessary to open files in `b`-mode (just add the `b` somewhere to the mode like `rb+`. Not sure if `.odt`-files are binaries. Also I'm a bit confused. Did it work? Note that if this anser fully answers your question please don't forget to [accept it](http://stackoverflow.com/help/accepted-answer). – MSeifert Feb 15 '17 at 07:02
  • I still have not found a way to get it to work. Just now I tried changing the mode to rb+ . I also have switched to using a .txt file for testing purposes. In addition, I tried commenting out the nested 'for' statement as well as changing the output method to print(). Still, nothing happens except that my shell accepts the query and gives me a new command line. Here is my most recent test: def lettersWhitespace(): with open('C:\\Users\\Charles\\Test.txt','rb+') as infile: for line in infile: print(line) – Charles R Feb 15 '17 at 07:39
  • if nothing happens there is nothing to print. That seems to indicate your file is empty. Are you trying to verify my answer (if yes you need to re-create the file after the `w+` opening) or is this about your original script? – MSeifert Feb 15 '17 at 07:47
  • The new input file ('Test') contains four lines of text. The question is simply, what is the error in my original code which would cause the script to execute without returning either output or error message. I wasn't sure if your answer was saying that the error is in the file mode, so I tried to verify by using various modes. No such changes have yet had an effect on the outcome. I am still running different variations on my original code in order to try to find something that will give me an output. – Charles R Feb 15 '17 at 08:13
  • Can you share your file or a similar file? – MSeifert Feb 15 '17 at 08:13
  • Sure. My test text file reads: Ny ob, t x h a i s \n e e j N E E g \n t x ia l is \n li nts ia l l u b \n However, is it possible that my problem is being caused by starting the script with a 'def' statement? Should I just be starting the script from 'with open(etc.)'? – Charles R Feb 15 '17 at 08:28
  • you do call the function, right? So there is a `lettersWhitespace()` (without indentation) at the end of your script - or it's inside a `__name__ == '__main__':` block? – MSeifert Feb 15 '17 at 08:35
  • No. I shortened the 'replacements' set dramatically for sake of posting; but otherwise, what I posted was the entire script. – Charles R Feb 15 '17 at 08:45
  • In that case just append (without indentation) a call to the function: `lettersWhitespace()` – MSeifert Feb 15 '17 at 09:02