0
import csv

data = {}
f = open("train.csv")
csv_f = csv.reader(f)
labels = next(csv_f)
for i in range(len(labels-1)):
    a = []
    for row in csv_f:
        a += row[i]
    data [labels[i]] = a

I am doing the code above trying to read a csv file and put the columns into a dictionary with the first part as the labels and the second part as the arra of the values. This works for my first column, which for my data set is 'ID', but it stops working after that -- it just leaves the a array as blank. I did some debugging and found that it was doing the outer for loop then the inner loop, but then when it did the second run through of the outer for loop, it just skipped the inner for loop. It did that for all subsequent ones also.

Why does it do that?

And how can I fix my code to make it stop?

Slava
  • 827
  • 5
  • 13
  • Do you understand what each of your loops do? – Ignacio Vazquez-Abrams Nov 03 '16 at 04:07
  • 1
    Possible duplicate of [Python csv.reader: How do I return to the top of the file?](http://stackoverflow.com/questions/431752/python-csv-reader-how-do-i-return-to-the-top-of-the-file) – Kevin Nov 03 '16 at 04:11
  • Can you give an example of the content of `train.csv` and what you want as an output because your description is not clear ? – EvensF Nov 03 '16 at 04:29
  • I know this doesn't directly answer your question, but please consider using [Pandas](http://pandas.pydata.org/). It makes all these issues disappear into `pandas.readcsv('train.csv')`. – chthonicdaemon Nov 03 '16 at 05:14

1 Answers1

0

Like Kevin says above I think the issue is that after your first iteration through csv_f you would have to reset the csv reader back to the beginning of the file. This algorithm requires you to parse the entire file many times to collect all the data though. A more efficient algorithm would parse the file row by row. I haven't checked this code so it might not be 100%, but hopefully it points you in the right direction.

import csv

data = {}
f = open("train.csv")
csv_f = csv.reader(f)

isLabelRow = True

for row in csv_f:
    print "Processing row : " + str(row)
    if isLabelRow:
        # Get labels from first row of data
        labels = []
        isLabelRow = False

        # Initialize data "columns"
        for label in row:
            print "Processing label : " + label
            labels.append(label)
            data[label] = [] # empty array

    else:
        # Add each item in the row to the appropriate "column" in data
        for i in range(len(row)):
            data[labels[i]].append(row[i])