-2

What can i do to optimize this function, and make it looks like more pythonic?

def flatten_rows_to_file(filename, rows):
    f = open(filename, 'a+')
    temp_ls = list()
    for i, row in enumerate(rows):
        temp_ls.append("%(id)s\t%(price)s\t%(site_id)s\t%(rating)s\t%(shop_id)s\n" % row)
        if i and i % 100000 == 0:
            f.writelines(temp_ls)
            temp_ls = []
    f.writelines(temp_ls)
    f.close()
justhalf
  • 8,960
  • 3
  • 47
  • 74
WindyYang
  • 151
  • 1
  • 3

2 Answers2

3

A few things that come to mind immediately:

  1. Use a with statement, rather than manually closing your file.
  2. Pass a generator expression to f.writelines rather than building up a 100000 row list over and over (let the standard library handle how much, if any, it buffers the output).
  3. Or, better yet, use the csv module to handle writing your tab-separated output.

Here's a quick stab at some improved code:

from csv import DictWriter

def flatten_rows_to_file(filename, rows):
    with open(filename, 'ab') as f:
        writer = DictWriter(f, ['id','price','site_id','rating','shop_id'],
                            delimiter='\t')
        writer.writerows(rows)

Note that if you're using Python 3, you need slightly different code for the opening the file. Use mode 'a' rather than 'ab' and add the keyword argument newline="". You didn't need the + in the mode you were using (you are only writing, not writing and reading both).

If the values in your rows argument may have extra keys beyond the ones you were writing, you'll need to pass some extra arguments to the DictWriter constructor as well.

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • If the rows is larger than a 1G or more? And your computer memory is only 1G. – WindyYang Oct 22 '13 at 10:45
  • 1
    @user2000477: Is `rows` some kind of generator then? If not, you'll already have run out of memory before the code ever runs. If it is a generator, either `f.writelines` or `csv.DictWriter.writerows` will do the right thing, keeping only the data for a single line in memory while it is being written. They essentially do the loop in tobias_k's answer (though in C code, so probably faster). – Blckknght Oct 22 '13 at 19:55
0

It is generally a good idea to use the with statement to make sure the file is closed properly. Also, unless I'm mistaken there should be no need to manually buffer the lines. You can just as well specify a buffer size when opening the file, determining how often the file is flushed.

def flatten_rows_to_file(filename, rows, buffsize=100000):
    with open(filename, 'a+', buffsize) as f:
        for row in rows:
            f.write("%(id)s\t%(price)s\t%(site_id)s\t%(rating)s\t%(shop_id)s\n" % row)
Community
  • 1
  • 1
tobias_k
  • 81,265
  • 12
  • 120
  • 179