0

I am writing data to a new file. I am reading data from flask request(uploading a file) but during writing I am providing option to cancel the writing, for that I have used process and event, and passing required arguments.

Read file

file = request.files.get("file")

method-1: contents = file.stream.read()

method-2: contents = file.stream.readlines()

For eg file as train.csv (10MB size)

Write file

def __init__(self, filename: str, contents: Any, read_length: int) -> bool:
 
 file: Path = DIRPATH / self.filename
 
 method-1:
       with open(file, "wb") as fp:
          write_length = fp.write(self.contents)
          
          if self.event.wait(0.4):
                 break

Result for method-1: The entire file is getting written in one go and my cancel option becomes useless. But the writing speed is very fast, takes only few seconds

method-2:
      with open(file, "wb") as fp:
         for line in self.contents:
             cnt = fp.write(line)
             write_length += cnt        
             
             if self.event.wait(0.4):
                break
             else:
                continue

Result for method-2: The entire file is getting written line by line and I am able to cancel the writting successfully but the writing speed is significantly slow, takes significant amount of miniutes.

Is there way to write good amount of chunks in file before waiting for event thereby making writing speed faster by using read() or readlines().

winter
  • 467
  • 2
  • 10
  • As the file is not written until flask has received it completely, I don't see any advantage allowing the user to cancel the process - as the upload itself has already happened. It you still want to use the above methods, what happens if you reduce the wait to, say 0.1? – DobbyTheElf Feb 21 '23 at 12:30
  • Hi Dobby, Thanks. I am newbie and what I am trying to do, is not to write to the file in the request. The request comes and the data to write is passed to the background process which writes the data to the file and after that new requests are sent after fix interval say 2 seconds to check the process finished or not. I tried by setting wait time to 0.1, that did not cancel the task(may be too much quick) but on 0.3 or 0.4 it do detect and cancel the task. – winter Feb 21 '23 at 13:09
  • If the cancel process is in the same thread as the file write process, then it cannot work. One of the answers for this question shows how to use multiprocessing to remove a file as a background process. You could adapt it to write the file instead: https://stackoverflow.com/questions/24612366/delete-an-uploaded-file-after-downloading-it-from-flask – DobbyTheElf Feb 22 '23 at 08:54
  • Hi Dobby, Thanks for the inputs. I solved it reading chunks. – winter Feb 22 '23 at 17:06

1 Answers1

0

To solve this I used method-1 combined with generators (found the idea from one of the SO questions). The idea is file.stream.read() returns total bytes, from that read the data in chunks like 1024 or 4096 bytes as per need and write that chunk into the file and repeat till all bytes are read. This way the writing speed is good compare to method-2.

    def read_in_chunks(self, block_size=1024):
        start = 0        
        while True:
            stop = start + block_size if start + block_size < self.read_length else self.read_length + 1
            data = self.contents[start:stop]
            if not data:
                break
            yield data
            start = stop
winter
  • 467
  • 2
  • 10