-3

I am writing a piece of code that involves generation of new parameter values over a double FOR loop and store these values to a file. The loop iteration count can go as high as 10,000 * 100,000. I have stored the variable values in a string, which gets appended on every iteration with newer values. Finally, at the end of loop I write the complete string in a txt file.

op=open("output file path","w+")
totresult = ""
for n seconds: #this user input parameter can be upto 100,000
    result = ""
    for car in (cars running): #number of cars can be 10000
        #Code to check if given car is in range to another car
        .
        .
        #if car in range with another car 
        if distance < 1000:
            result = getDetailsofOtherCar()
            totresult = totalresult + carName + result
#end of loops
op.write(totresult)
op.close()

My question here is, is there a better pythonic way to perform this kind of logging. As I am guessing the string gets very bulky in the later iterations and may be causing delay in execution. Is the use of string the best possible option to store the values. Or should I consider other python data structures like list, array. I came across Logging python module but would like to get an opinion before switching to it.

I tried looking up for similar issues but found nothing similar to my current doubt.

Open to any suggestions

Thank you

Edit: code added

akzhere
  • 323
  • 4
  • 11
  • Eventually you should show your code if you want it optimised. – sehigle Dec 20 '18 at 09:10
  • code added. hope this makes sense. rather than a code-specific optimization I am looking for a general comment on the use of string as the better option to perform such operation. – akzhere Dec 20 '18 at 09:38
  • I don't think that there is a pythonic way. The normal approach would be to write a buffer and have a thread writing this buffer to memory. (If you want to optimize the I/O) Maybe this helps https://docs.python.org/3/howto/logging-cookbook.html – sehigle Dec 20 '18 at 09:44
  • You should profile your code and see where it's spending the majority of its time. If you don't you may be wasting your time speeding up something that won't make a difference. It could also surprise you. Anyway, it's easy to in Python: See [How can you profile a Python script?](https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script) – martineau Dec 20 '18 at 10:15
  • @sehigle I will try to work on the thread approach. – akzhere Dec 20 '18 at 10:20
  • @martineau, yes profiling should be the right way but my current run is still running overnight and it would be waste if I terminate abruptly now. I have not done profiling before, so a little skeptical if it can help when a third-party tool is used to fetch the vehicle data in my case. thanks! – akzhere Dec 20 '18 at 10:25
  • It will tell you if that's were most of the time is being spent—which seems like it could be useful to know... – martineau Dec 20 '18 at 10:27
  • 2
    I've read it's better to avoid string concatenation and instead build a list of string components and the `''.join()` them all together once at the end. That may do nothing to speed your program up however, since we don't know if that's a bottleneck or not. – martineau Dec 20 '18 at 10:31

1 Answers1

1

You can write to the file as you go e.g.

with open("output.txt", "w") as log:
    for i in range(10):
        for j in range(10):
            log.write(str((i,j)))

Update: whether or not directly streaming the records is faster than concatenating them in a memory buffer depends crucially on how big the buffer becomes, which in turn depends on the number of records and the size of each record. On my machine this seems to kick in around 350MB.

enter image description here

Joe Halliwell
  • 1,155
  • 6
  • 21