2

hope you are having a great day!

In my recent ventures with Python 3.8.5 I have come across a dilemma I must say... Being that I am a fairly new programmer I am afraid that I don't have the technical knowledge to load a single (BIG) file into the program.

To make my question much more understandable lets look at this down below:


  1. Lets say that there is a file on my system called "File.mp4" or "File.txt" (1GB in size);
  2. I want to load this file into my program using the open function as rb;
  3. I declared a buffer size of 1024;

This is the part I don't know how to solve

  1. I load 1024 worth of bytes into the program
  2. I do whatever I need to do with it
  3. I then load another 1024 bytes in the place of the old buffer
  4. Rinse and repeat until the whole file has been ran trough.

I looked at this question but either it is not good for my case or I just don't know how to implement it -> link to the question


This is the whole code you requested:

BUFFER = 1024

with open('file.txt', 'rb') as f:
while (chunk := f.read(BUFFER)) != '':
    print(list(chunk))
Igrutinovic_l
  • 23
  • 1
  • 6
  • It would be helpful if you could explain why the link doesn't solve your problem. Seems to me it's exactly what you need... – Tomerikoo Aug 27 '20 at 15:50

2 Answers2

3

This is one of the situations that python 3.8's new walrus operator - which both assigns a value to a variable, and returns the value that it just assigned - is really good for. You can use file.read(size) to read in 1024-byte chunks, and simply stop when there's no more file left to read:

buffer_size = 1024
with open('file.txt', 'rb') as f:
    while (chunk := f.read(buffer_size)) != b'':
        # do things with the variable `chunk`, which should have len() == 1024

Note that the != b'' part of the condition can be safely removed, as the empty string will evaluate to False when used as a boolean expression.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • Thank you for your answer! Although when I try to split the chunk into a list and then try to print the list I get an infinite loop! – Igrutinovic_l Aug 27 '20 at 15:50
  • @Igrutinovic_I Can you edit your question to show some of the code that's doing that? I can't picture what kind of code would cause that to happen. – Green Cloak Guy Aug 27 '20 at 15:52
  • @Igrutinovic_I Ah, I think I see the problem. I was using `!= ''` in my code, but since we're reading in binary mode, `f.read()` returns a bytes object instead of a string. So it should be `!= b''` instead. OR, removing that conditional entirely should still work. I'll edit my answer to accommodate. – Green Cloak Guy Aug 27 '20 at 16:15
3

You can use buffered input from io with bytearray:

import io

buf = bytearray(1024)
with io.open(filename, 'rb') as fp:
    size = fp.readinto(buf)
    if not size:
       break

    # do things with buf considering the size
bereal
  • 32,519
  • 6
  • 58
  • 104
  • With this I do get the reads but I get 0x00 characters if the file is less than 1024 – Igrutinovic_l Aug 27 '20 at 15:55
  • @Igrutinovic_l yes, that's always the case for any fixed-size buffer, that's why you don't need to use bytes outside of `buf[0:size]`. – bereal Aug 27 '20 at 16:04