1

Suppose I have a file (say file1.txt) with data around 3mb or more. If I want to write this data to a second file (say file2.txt), which one of the following approaches will be better?

Language used: Python 2.7.3

Approach 1:

file1_handler = file("file1.txt", 'r')
for lines in file1_handler:
    line = lines.strip()
    # Perform some operation
    file2_handler = file("file2.txt", 'a')
    file2_handler.write(line)
    file2_handler.write('\r\n')
    file2_handler.close()
file1_handler.close()

Approach 2:

file1_handler = file("file1.txt", 'r')
file2_handler = file("file2.txt", 'a')
for lines in file1_handler:
    line = lines.strip()
    # Perform some operation
    file2_handler.write(line)
    file2_handler.write('\r\n')
file2_handler.close()
file1_handler.close()

I think approach two will be better because you just have to open and close file2.txt once. What do you say?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Hemant
  • 619
  • 2
  • 6
  • 17
  • 2
    Open a file with [open](http://docs.python.org/2/library/functions.html#open), not with [file](http://docs.python.org/2/library/functions.html#file). – Matthias Mar 20 '13 at 13:23

3 Answers3

6

Use with, it will close the files automatically for you:

with open("file1.txt", 'r') as in_file, open("file2.txt", 'a') as out_file:
    for lines in in_file:
        line = lines.strip()
        # Perform some operation
        out_file.write(line)
        out_file.write('\r\n')

Use open instead of file, file is deprecated.

Of course it's unreasonable to open file2 on every line of file1.

Pavel Anossov
  • 60,842
  • 14
  • 151
  • 124
  • 1
    I was writing the same think :) @Hemant, look at: http://docs.python.org/2/whatsnew/2.5.html#pep-343-the-with-statement – Francesco Frassinelli Mar 20 '13 at 13:17
  • Regarding f2.write('\r\n'): in order to do this you need to open f2 as binary file (appending "b" to the flag). – Francesco Frassinelli Mar 20 '13 at 13:19
  • oops! I thought open has been deprecated :p ( i din't read the documents properly) so does the speed of writing increase? Because approach one was taking almost 2 hours for copying 1 MB of data. – Hemant Mar 20 '13 at 13:19
  • @Francesco: can you please explain a bit on this binary file approach? – Hemant Mar 20 '13 at 13:21
  • @Hemant: look at: Look at: http://stackoverflow.com/questions/2536545/how-to-write-unix-end-of-line-characters-in-windows-using-python If you write "\n" without the binary mode, Python will use the more appropriate endline for your platform. If you want to use a particular endline, you should use the binary flag. – Francesco Frassinelli Mar 20 '13 at 13:21
  • This is relevant only on windows. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. – Pavel Anossov Mar 20 '13 at 13:23
  • @Hemant: you could consider to implement a buffer in order to make a pair of big writes and not many little ones. You could just append your lines to a list and, when the list is quite big, write them all. – Francesco Frassinelli Mar 20 '13 at 13:26
  • @Francesco: the reason of using '\r\n' is that I want to enter the next line from file1 to the next line of file2. does binary mode does this or simply keep '\r\n' as it is? – Hemant Mar 20 '13 at 13:27
  • @Hemant: yes. \r\n is for Windows, on Linux/Mac you have \n. If you're on Linux and you want to write as Windows (for example), you have to use the binary mode and write \r\n. – Francesco Frassinelli Mar 20 '13 at 13:29
  • @Francesco: I'll try to implement this buffer approach :) – Hemant Mar 20 '13 at 13:29
  • 1
    Binary mode has no effect on Unix. It writes what it's told to write. – Pavel Anossov Mar 20 '13 at 13:30
  • @Francesco: I din't get you.. your 'yes' is for which answer? converting to the newline or writing '\r\n' as it is? – Hemant Mar 20 '13 at 13:31
  • 1
    Text mode on Windows converts `\r` and `\n` to `\r\n`. `\r\n` is left unchanged. – Pavel Anossov Mar 20 '13 at 13:31
  • @PavelAnossov thank you for the information. Hemant: PavelAnossov answered correctly :) – Francesco Frassinelli Mar 20 '13 at 13:34
0

I was recently doing something similar (if I understood you well). How about:

file = open('file1.txt', 'r')
file2 = open('file2.txt', 'wt')

for line in file:
  newLine = line.strip()

  # You can do your operation here on newLine

  file2.write(newLine)
  file2.write('\r\n')

file.close()
file2.close()

This approach works like a charm!

Oscar_Mariani
  • 728
  • 3
  • 9
  • 22
0

My solution (derived from Pavel Anossov + buffering):

dim = 1000
buffer = []
with open("file1.txt", 'r') as in_file, open("file2.txt", 'a') as out_file:
    for i, lines in enumerate(in_file):
        line = lines.strip()
        # Perform some operation
        buffer.append(line)
        if i%dim == dim-1:
            for bline in buffer:
                out_file.write(bline)
                out_file.write('\r\n')
            buffer = []

Pavel Anossov gave the right solution first: this is just a suggestion ;) Probably it exists a more elegant way to implement this function. If anyone knows it, please tell us.

Francesco Frassinelli
  • 3,145
  • 2
  • 31
  • 43