Problem with python memory, flush, csv size

Question

After solving a sorting of a dataset, I have a problem at this point of my code.

with open(fns_land[xx]) as infile:
    lines = infile.readlines()
    for line in lines:
        result_station.append(line.split(',')[0])
        result_date.append(line.split(',')[1])
        result_metar.append(line.split(',')[-1])

I have a problem in the lines line. In this line the data are sometimes to huge and i get a kill error.

Is there a short/nice way to rewrite this point?

Possible duplicate of [Python readlines() usage and efficient practice for reading](https://stackoverflow.com/questions/17246260/python-readlines-usage-and-efficient-practice-for-reading) — The Pjot, Nov 14 '18 at 14:23

score 1 · Accepted Answer · answered Nov 14 '18 at 14:24

Use readline instead, this read it one line at a time without loading the entire file into memory.

with open(fns_land[xx]) as infile:
    while True:
        line = infile.readline()
        if not line:
            break
        result_station.append(line.split(',')[0])
        result_date.append(line.split(',')[1])
        result_metar.append(line.split(',')[-1])

score 1 · Answer 2 · answered Nov 14 '18 at 14:41

If you are dealing with a dataset, I would suggest that you have a look at pandas, which I great for dealing with data wrangling.

If your problem is a large dataset, you could load the data in chunks.

import pandas as pd
tfr = pd.read_csv('fns_land{0}.csv'.format(xx), iterator=True, chunksize=1000)

Line: imported pandas modul
Line: read data from your csv file in chunks of 1000 lines.

This will be of type pandas.io.parsers.TextFileReader. To load the entire csv file, you follow up with:

df = pd.concat(tfr, ignore_index=True)

The parameter ignore_index=True is added to avoid duplicity of indexes.

You now have all your data loaded into a dataframe. Then do your data manipulation on the columns as vectors, which also is faster than regular line by line.

Have a look here this question that dealt with something similar.

Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas. — S.Kociok, Nov 14 '18 at 14:54

Problem with python memory, flush, csv size

2 Answers2