Python loop over text files and save contents into list of lists, dead kernel

Question

I'm facing a dead kernel problem. I'm trying to save over 2000 .txt files into a list of lists. my_path contains paths to these 2000+ files. I tried try - except as below but it didn't help. The kernel seems to die randomly, i.e. I tried to find files where it breaks, but it seems to break on files which weren't a problem during a previous run.

my_list = []
for i in my_path:
    with open(i) as f:
        try:
            lines = f.read().splitlines()
            #print(f)
            my_list.append(lines)
            f.close() 
        except:
            print(f)

I also tried opening the files where kernel died separately and they seem to work fine. I assume something is wrong with my loop?

UPD. I'm using EndeavourOS, Jupyter in VSCode, RAM 16 GB. I split the paths and it looks like I'm running out of memory. I tried del ... and gc.collect(), but unsuccessful, it doesn't free the memory and once it's over 12 GB it crashes.

It would be good if you could add more information to the question, e.g. memory usage, system you're running, etc. — Timus, Oct 13 '21 at 08:18
Okay, then my guess would be that `my_list` just gets too big. What are you trying to do with `my_list`? Maybe a more _lazier_ approach would avoid the problem (like producing the lines through a generator when needed)? — Timus, Oct 13 '21 at 14:53
@Timus thanks a lot, I think you are right, the final while gets way to large which overloads the system. I'm not quite sure what you mean by `producing the lines through a generator when needed`, but after some data inspection I'm just limiting it to the first 1000 lines and it seems to work okay for my sample. — haven, Oct 15 '21 at 01:59
Sorry for the cryptic language. What I meant is: By using generators you can construct objects which can parse large amounts of data pretty memory-efficiently (_"lazy"_). Whether that option is available here or not depends on what you want to achieve, therefore my question regarding your intentions with `my_list`. If you're interested look [here](http://www.dabeaz.com/generators-uk/GeneratorsUK.pdf) for example (or [here](http://www.dabeaz.com/generators2/index.html?utm_source=pocket_mylist)). — Timus, Oct 15 '21 at 07:42
... or [this](https://stackoverflow.com/questions/17444679/reading-a-huge-csv-file) question and the _accepted_ answer. (No guarantee, though, that it is applicable in your use case.) — Timus, Oct 15 '21 at 11:08

Python loop over text files and save contents into list of lists, dead kernel

0 Answers0