Python: how to get part of file by size

Question

I would like to read just one part (not chunks) from a txt-file (10GB) with lines and write them into another file. The size of the part should be exactly 25MB.

I have tried with linecache.getlines, but it was not very exactly. Thanks.

Possible duplicate of [Lazy Method for Reading Big File in Python?](https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python) — Luke Ramsden, May 16 '18 at 13:46
How important is it that you *read* exactly 25MB compared to the importance that you write the correct 25MB to the output file? — Scott Hunter, May 16 '18 at 13:49
You can use the method here https://stackoverflow.com/questions/50062474/split-really-large-file-into-smaller-files-in-python-too-many-open-files/50062917#50062917 basically, `import pandas as pd import os df_chunked = pd.read_csv("myLarge.csv", chunksize=30000) — Vipluv, May 16 '18 at 13:54
If this is a file with lines, can't you use `for line in file_handler:`? — Ken T, May 16 '18 at 13:57
@ScottHunter aha, to write into the output file will be more important — muc777, May 16 '18 at 14:08

score 2 · Accepted Answer · answered May 16 '18 at 14:04

A simple way to perform the split is to use read(), assuming each character is a byte.

for nameadd in range(10*1024/25):
    f = open('fname.txt')
    saveTxt = f.read(25*(1024**2))
    fSave = open(str(nameadd)+'fname.txt','w')
    fSave.write('%s',saveTxt)

Vipluv · Answer 2 · 2018-05-16T14:20:57.140

0

It is already described here Lazy Method for Reading Big File in Python?

def read_in_chunks(file_object, chunk_size=25*1024*1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 25MB."""
while True:
    data = file_object.read(chunk_size)
    if not data:
        break
    yield data 
f = open('really_big_file.dat')
for piece in read_in_chunks(f):
   process_data(piece)

edited May 16 '18 at 14:20

answered May 16 '18 at 14:10

Vipluv

884
7
23

Python: how to get part of file by size

2 Answers2