1

I have a huge dna sequence saved in a .txt file of size 140GB that I would like to open using a txt file editor. Notepad, Python, R don't allow to open such a file. Is there a dedicated text file editor to open large files?

I am currently using this code in Python to open the 140GB large file .txt file:

path = open("my_file_path\my_140GB_file.txt", "r")
file = path.read()
print(file)

The error message is MemoryError referring to file = path.read()

David Makogon
  • 69,407
  • 21
  • 141
  • 189
d.cio
  • 62
  • 6
  • 3
    And what is your question? Loading the entire document at once is not feasible with something like `read()` on a file object, you can instead read parts of it incrementally, either by line, character, or byte and split them up as you'd like – Jeremy Mar 06 '22 at 15:19
  • Thank you Jeremy, I added more informations to the question to clarify my issue – d.cio Mar 06 '22 at 15:26
  • 1
    Python can open a file of any size. Show us the Python code you are using that you think will not open the file. I would guess that you are using a high-level function that opens the file and then attempts to read it all into memory. That will obviously not work, in any language. – BoarGules Mar 06 '22 at 16:22
  • Thank you, I added the python code I am using to open the 140GB .txt file – d.cio Mar 06 '22 at 17:36
  • see https://stackoverflow.com/questions/6475328/how-can-i-read-large-text-files-line-by-line-without-loading-it-into-memory – balderman Mar 07 '22 at 15:42
  • I can't mark this question as duplicate because the other question has no accepted answer, but here you go, this applies to you as well: https://stackoverflow.com/a/71139812/1319284 – kutschkem Mar 07 '22 at 15:42
  • @d.cio It's still unclear _what you're trying to achieve_ - you can't display 140GB worth of text at once anyway (where would you find a screen big enough?), so what do you want to _do_ with it? – Mathias R. Jessen Mar 07 '22 at 15:45
  • Apparently (?), this is not a python question. It is a question about windows text editors. Please update your question. There is no such thing as any text 'editor' that can open a file that big. Linux tools such as `less` can show you file of practically any size as it doesn't read the whole thing in either. See https://stackoverflow.com/q/159521/503621 – B. Shea Mar 07 '22 at 15:49

1 Answers1

0

There are a multiple of ways to read large text files in Python. If it is a delimited file, you might want to use the pandas library.

You can use a context manager and read chunks as follows.

Python 3.8+

with open("my_file_path\my_140GB_file.txt", "r") as f:
    while chunk := f.read(1024 * 10):   # you can use any chunk size you want
        do_something(chunk)

Before Python 3.8

You can iterate with a lambda:

with open("my_file_path\my_140GB_file.txt", "rb") as f:
    for chunk in iter(lambda:f.read(1024*10), ""):
        do_something(chunk)

Or, if the file is line based, you can read each line.

with open("my_file_path\my_140GB_file.txt", "r") as f:
    for line in f:
        do_something(line)

Pandas DataFrame for delimited files

If your file is delimited (like a csv), then you might consider using pandas.

import pandas as pd
for chunk in pd.read_csv("my_file_path\my_140GB_file.csv", chunksize=2):
    do_something(chunk )


            

        
cadvena
  • 1,063
  • 9
  • 17