0

So I am attempting to iterate through a .csv file and do some calculations based off of it, my problem being that the file is 10001 lines long and when my program executes it only seems to read 5001 of those lines. Am I doing something wrong when reading in my data or is there a memory limit or some sort of other limitation I am running into? The calculations are fine but they are off from the expected results in some instances and thus I am lead to believe that the missing half of the data will solve this.

fileName = 'normal.csv' #input("Enter a file name: ").strip()
file = open(fileName, 'r') #open the file for reading
header = file.readline().strip().split(',') #Get the header line
data = [] #Initialise the dataset
for index in range(len(header)):
    data.append([])
for yy in file:
    ln = file.readline().strip().split(',') #Store the line
    for xx in range(len(data)):
        data[xx].append(float(ln[xx]))

And here is some sample output, yet to be completley formatted but it will be eventually:

"""The file normal.csv contains 3 columns and 5000 records.
         Column Heading   |        Mean        |     Std. Dev.      
      --------------------+--------------------+--------------------
      Width [mm]|999.9797|2.5273
      Height [mm]|499.9662|1.6889
      Thickness [mm]|12.0000|0.1869"""

As this is homework I would ask that you attempt to keep responses helpful but not outright the solution, thank you.

brodieR
  • 69
  • 9
  • AFAICT, you are reading 2 lines in one iteration. "yy" already contains a line, and calling "file.readline" will move you to the next line. You should directly process the contents of "yy" without calling readline. – schaazzz Oct 10 '17 at 17:07

1 Answers1

2

That's because you are asking Python to read lines in two different locations:

for yy in file:

and

ln = file.readline().strip().split(',') #Store the line

yy is already a line from the file, but you ignored it; iteration over a file object yields lines from the file. You then read another line using file.readline().

If you use iteration, don't use readline() as well, just use yy:

for yy in file:
    ln = yy.strip().split(',') #Store the line

You are re-inventing the CSV-reading wheel, however. Just use the csv module instead.

You can read all data in a CSV file into a list per column with some zip() function trickery:

import csv

with open(fileName, 'r', newline='') as csvfile:
    reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC)  # convert to float
    header = next(reader, None)   # read one row, the header, or None
    data = list(zip(*reader))  # transpose rows to columns
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • That did it. Thanks. Changed `ln = file.readline().strip().split(',')` to `ln = yy.strip().split(',')` Works like a charm. – brodieR Oct 10 '17 at 17:07
  • As for re-inventing the wheel, that is what most comp-sci courses entail, but thanks for the heads up anyhow. – brodieR Oct 10 '17 at 17:12