0

After solving a sorting of a dataset, I have a problem at this point of my code.

with open(fns_land[xx]) as infile:
    lines = infile.readlines()
    for line in lines:
        result_station.append(line.split(',')[0])
        result_date.append(line.split(',')[1])
        result_metar.append(line.split(',')[-1])

I have a problem in the lines line. In this line the data are sometimes to huge and i get a kill error.

Is there a short/nice way to rewrite this point?

toti08
  • 2,448
  • 5
  • 24
  • 36
S.Kociok
  • 168
  • 1
  • 14
  • Possible duplicate of [Python readlines() usage and efficient practice for reading](https://stackoverflow.com/questions/17246260/python-readlines-usage-and-efficient-practice-for-reading) – The Pjot Nov 14 '18 at 14:23

2 Answers2

1

Use readline instead, this read it one line at a time without loading the entire file into memory.

with open(fns_land[xx]) as infile:
    while True:
        line = infile.readline()
        if not line:
            break
        result_station.append(line.split(',')[0])
        result_date.append(line.split(',')[1])
        result_metar.append(line.split(',')[-1])
Rocky Li
  • 5,641
  • 2
  • 17
  • 33
1

If you are dealing with a dataset, I would suggest that you have a look at pandas, which I great for dealing with data wrangling.

If your problem is a large dataset, you could load the data in chunks.

import pandas as pd
tfr = pd.read_csv('fns_land{0}.csv'.format(xx), iterator=True, chunksize=1000)
  1. Line: imported pandas modul
  2. Line: read data from your csv file in chunks of 1000 lines.

This will be of type pandas.io.parsers.TextFileReader. To load the entire csv file, you follow up with:

df = pd.concat(tfr, ignore_index=True)

The parameter ignore_index=True is added to avoid duplicity of indexes.

You now have all your data loaded into a dataframe. Then do your data manipulation on the columns as vectors, which also is faster than regular line by line.

Have a look here this question that dealt with something similar.

Philip
  • 944
  • 11
  • 26
  • Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas. – S.Kociok Nov 14 '18 at 14:54