0

I am testing different speed of appending data to the end of a file.

The file is saved as a .txt. The contents of the file is a list of dicts.

Example dict: {'posted': ['2020-09-06T22:27:56.849149+00:00', '2020-09-06T22:27:56.849149+00:00'], 'seller_name': ['cheesetoken', 'cheesetoken'], 'seller_is_NPC': [False, False], 'Listings_sold': 2, 'quality': 2, 'price': 0.256, 'quantity_sold': 296554, 'datetime': datetime.datetime(2020, 9, 7, 0, 22, 27, 490902)} I'll shorted this to {data} for simplicities sake.

The file will continue to get larger with time, but currently is 1MB in size and will increase by approx 1MB every 14-21 days.

I want to append data to this list. The data I want to append will itself to a list. If I had [{data1},{data2},{data3},{data4}] save to disk already and I wanted to append [{data5},{data6}], I'd want to be able to easily read the data (It doesn't have to be saved like this) as [{data1},{data2},{data3},{data4},{data5},{data6}]

My original code to do this was:

    for x in formatted_sell_list:
        content = x.copy()
        file_name = str(db_number) + '- Q' + str(loop)
        if len(x) > 0:

            try:
                with open(path, str(file_name)) + '.txt', "r") as file1:
                    data = eval(file1.read())
                    file1.close()

            except:
                # print('Error no file to read: ' + str(db_file_name) + '.txt')
                data = []

            data = data + content

            with open(path, str(file_name)) + '.txt', "w") as file1:  # Overwriting
                file1.write(str(data))
                file1.close()

        loop = loop + 1

I felt this was probably quite an inefficient method of doing this, reading the entire file, evaling it, appending to the list and overwriting. I decided a line by line appending may work better, so I used this:

    for x in formatted_sell_list:
        content = x.copy()
        file_name = str(db_number) + '- Q' + str(loop) +' NEW'
        if len(x) > 0:

            for write_me in content:
                # Open the file in append & read mode ('a+')
                with open(path, str(file_name)) + '.txt', "a+") as file_object:

                    # Append text at the end of file
                    file_object.write(str(write_me))
                    file_object.write("\n")

        loop = loop + 1

I ran these alongside each other and timed how long each section of code took using time.time(). I found that in 100% of cases (File sizes between 1.3MB and 1KB) the old method was faster. On average it ran 4.5X times faster. Further testing showed that the most time intensive portion of the second piece of code was by far open the file.

Any suggestions to make this code faster/more efficient would be hugely appreciated.

Edited code:

for x in formatted_sell_list:
    # print('loop = ' + str(loop))
    content = x.copy()
    file_name = str(db_number) + '- Q' + str(loop) +' NEW'
    # print('Writing to ' + str(db_file_name) + ", " + str(content))
    if len(x) > 0:

        # Open the file in append & read mode ('a+')
        with open(os.path.join(r'C:\Users\PC\PycharmProjects\Simcompanies\Files\RecordedSales2',
                               str(file_name)) + '.txt', "a+") as file_object:

            for write_me in content:

                    # Append text at the end of file
                    file_object.write(str(write_me))
                    file_object.write("\n")
F1rools22
  • 97
  • 7
  • 2
    Try opening your file outside of the for loop. Writing multiple times will still result in an append. – smcjones Sep 24 '20 at 01:26
  • @smcjones Thought of this immediately after posting- Duh! It's no faster, in fact it's a little slower, altho not by much at all. Do you think it's more efficient in any way? – F1rools22 Sep 24 '20 at 01:29
  • @F1rools22 Post the edited code (with the open outside of the for loop) so that we can be sure that you aren't getting something else wrong. – user202729 Sep 24 '20 at 01:31
  • Edited code posted. – F1rools22 Sep 24 '20 at 01:45
  • Based on your code your string input is not too big to hold in memory. So, just append write it all at once (concatenate your list if you need to, joining with new lines). I/O is expensive. Limiting write operations is a plus. If you can’t limit it in your second version stick to the first one. – smcjones Sep 24 '20 at 01:49
  • @smcjones I'm afraid I don't quite understand. If you mean 'formatted_sell_list', it is a list of 9 lists, some of which are empty. If you mean the variable of 'content' then that is just the example dict I provided above. I'm not sure how I would write all lines at once tho for every item in the list, could you give a code example of how to do that if thats what you meant? – F1rools22 Sep 24 '20 at 01:52
  • Wrote an answer. TBH still not 100% sure what it is you’re doing but what I wrote is roughly equivalent to your for loop where you write a line then write a newline. If it’s not getting you any closer, one read and one write is better than n writes, where n is greater than 2. – smcjones Sep 24 '20 at 01:56
  • They mean "outside of the for loop" -- i.e. the outer for loop. The file name does not depend on x right?... – user202729 Sep 24 '20 at 02:14

1 Answers1

1

I/O operations are expensive.

Keep your writing to a minimum. Format your list into the string format you want and then perform one write operation.

Something like this:

with open(file) as fh:
    fh.write('\n'.join(map(str, content)) + '\n')
Gershom Maes
  • 7,358
  • 2
  • 35
  • 55
smcjones
  • 5,490
  • 1
  • 23
  • 39
  • Python buffer the `write`s by default. I bout this will be much faster. https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file – user202729 Sep 24 '20 at 02:13
  • Plus, you need a `+'\n'` at the end. – user202729 Sep 24 '20 at 02:13
  • Let Python optimize writing as it sees fit. This should be cheaper than the first iteration which was one read plus one write, and cheaper than the second iteration which was no read but one write per line. Added the trailing newline. – smcjones Sep 24 '20 at 03:21
  • What does the + '\n' at the end do? – F1rools22 Sep 24 '20 at 13:50
  • Your code wrote a newline at the end of every line. It means appending in the future will be on a new line. – smcjones Sep 24 '20 at 14:03