2

I am sniffing network packets using Tshark (Command-Line wireshark) and writing them to a file just as I receive them. My code block is similar to following:

documents = PriorityQueue(maxsize=0)
writing_enabled = True
with open("output.txt", 'w') as opened_file:
    while writing_enabled:
        try:
            data = documents.get(timeout=1)
        except Exception as e:
            #No document pushed by producer thread
            continue
        opened_file.write(json.dumps(data) + "\n")

If I receive files from Tshark thread, I put them into queue then another thread writes it to a file using the code above. However, after file reaches 600+ MB process slows down and then change status into Not Responding. After a research I think that this is because of default buffering mechanism of open file method. Is it reasonable to change the with open("output.txt", 'w') as opened_file: into with open("output.txt", 'w', 1000) as opened_file: to use 1000 byte of buffer in writing mode? Or there is another way to overcome this?

I.K.
  • 414
  • 6
  • 18
  • @Alderven Actually I am opening the file once. So `open` is called once hence I dont think I overrite it each time. – I.K. Sep 24 '19 at 11:54
  • Have you tried to flush the buffer? https://www.tutorialspoint.com/python/file_flush.htm – Chris Sep 24 '19 at 11:55
  • 1
    Totally unrelated, but you definitly want to either restrict your `except` clause to the exact exception(s) you expect here or at least log the exceptions you catch - else if something unexpected happens you will never know. – bruno desthuilliers Sep 24 '19 at 12:06
  • How the `writing_enabled` flag changes its value? Is there any thread to change its value? – Akash Pagar Sep 24 '19 at 12:15
  • @brunodesthuilliers Thank you I will take care of that ^^ – I.K. Sep 24 '19 at 12:16
  • @AkashPagar Its value is handled in another class thread but its off-topic so I didnt involved it, but thank you for pointing out. – I.K. Sep 24 '19 at 12:17

1 Answers1

2

For writing the internal buffer to the file you can use the files flush function. However, this should generally be handled by your operating system which has a default buffer size. You can use something like this to open your file if you want to specify your own buffer size:

f = open('file.txt', 'w', buffering=bufsize)

Please also see the following question: How often does Python flush to file

Alternatively to flushing the buffer you could also try to use rolling files, i.e. open a new file if the size of your currently opened file exceeds a certain size. This is generally good practice if you intend to write a lot of data.

Chris
  • 899
  • 2
  • 17
  • 25
  • Thank you for your answer, its not related but after hours of debugging it turns out the reason it changes status to not responding is overload of a GUI component. However I will try your solution with timers too obtain performance differences. – I.K. Sep 24 '19 at 15:51