0

My flask API needs to handle large files (multiple Gb). However, i don't need the complete file. I just need the first n lines of the file, so uploading the entire file is just a bottleneck for my API.

Currently, i am using the FileField from flask_wtf.file.

I am using a code similar to this one:

@app.route('/home')
def home():

    form = get_File_Field()
    if form.validate_on_submit():

        huge_file = form.file.data
        name = secure_filename(huge_file.filename)
        huge_file.save(path)

The get_File_Field() contains the FileField from flask_wtf.

Is there a way to just upload n lines and then stop the upload?

B.abyface
  • 9
  • 4

2 Answers2

0

What you want to do, conceptually, is pass the task to the background so that the site returns a 200 response and the uploading continues. Often, you would create a task id so that the user can return to a URL and see if the file has finished uploading, processing, etc. The most common way this is achieved is with a task scheduler that stores data in third-party software such as RabbitMQ or redis. Celery is a very commonly used Python library for such scheduling.

For more detail, check out https://flask.palletsprojects.com/en/1.1.x/patterns/celery/ and https://blog.miguelgrinberg.com/post/using-celery-with-flask

The second link contains an example implementation on Github at https://github.com/miguelgrinberg/flask-celery-example

UPDATE: If your goal is merely to obtain the first N bytes of the file as the questioner's comment suggests, you can do this with the read method for text-like files or binary data. Like this:

f = open('really_big_file.dat')
# for binary data ...
# f = open("sample.bin", "rb")
head = f.read(1024)

In this example, 1024 is the number of bytes. If the file is a csv or similar with lines endings, you can use this method to lazy read the file until you hit a "\n", like this:

f = open('really_big_file.dat')
def read_part():
    return f.read(1024)

output = ""
for piece in iter(read_part, ''):
    lines = piece.split("\n")
    if len(lines) > 1:
        output += lines[0]
        break
    else:
        output+=piece

See also Lazy Method for Reading Big File in Python?

Matt L.
  • 3,431
  • 1
  • 15
  • 28
  • i get your idea, but this is unfortunately not what i am looking for. I just need a fraction of the files content. Scheduling won't do it here... – B.abyface May 19 '20 at 18:09
  • If you only get part of the file, how will you get the rest of the file again later? Will the user re-upload it? – Matt L. May 19 '20 at 18:34
  • Thats the point: i don't need the rest of the file. The Input file will have a repetitive format with strings, i.e. DNA fragments. After n lines of strings, my API has all what it needs so the rest of the file is not needed and the full upload of this huge file will slow down the entire process – B.abyface May 19 '20 at 18:49
0

i am now using javascript for reading the file: Read n lines of a big text file

B.abyface
  • 9
  • 4