Python reading row using list-comprehension (csv and json files)

Question

I have question about the usage of list comprehension in reading files as a csv or json file. In this first code I am using the normal long way to take a value of a row and append it in the end to an empty list. This works fine with no problem at all as expected:

with open("file.csv") as W_F:
    reader = csv.reader(W_F)
    header = next(reader)

    brightness,lons,lats=[], [], []

    for row in reader:
        bright=float(row[2])

        brightness.append(bright)
        lons.append(float(row[0]))
        lats.append(float(row[1]))

In this code I tried making my code smaller by using a list comprehension, but here I am getting a problem. I am only getting the value of the first list brightness = [float(row[2]) for row in reader]. The other lists are getting printed as an empty list (lon = [float(rows[1])for rows in reader] and lat = [float(rows[1])for rows in reader]).

with open("file.csv") as W_F:
    reader = csv.reader(W_F)
    header = next(reader)

    brightness = [float(row[2]) for row in reader]
    lon = [float(rows[0])for rows in reader]
    lat = [float(rows[1])for rows in reader]

Here I am using a list comprehension while reading a json file. I am getting all the values without a problem:

with open("file.json")as f:
    all_eq_data = json.load(f)

all_eq_dicts = all_eq_data['features']

mag = [dicts["properties"]["mag"] for dicts in all_eq_dicts]
lang = [dicts["geometry"]["coordinates"][0] for dicts in all_eq_dicts]
lat = [dicts["geometry"]["coordinates"][1] for dicts in all_eq_dicts]

Can someone please explain to me why the list comprehension in the second code doesn't work properly? Why in the second code it's only storing values in the first list but not the others? Why it's working in the 3rd code but not in the second? What is the difference between the first code and the second if I am doing something wrong (Note: first and second code are using the same file).

This is not a good use case for list comprehensions. In your first code you filled all lists in only one for loop. In the second and third code you needed three list comprehensions, which is equivalent to three for loops. — Wups, Sep 25 '20 at 21:02

score 2 · Answer 1 · answered Sep 25 '20 at 20:46

Iterating over an iterator calls the __next__ method for every item to get the next item in the iterator. The way a for-loop ends is when the __next__ method throws a StopIteration exception because it has no more items to give you. This mechanic "burns out" the iterator (aka by the end of the iteration there will be no more "next" item).

In the second code snippet you try and iterate over the whole iterator and then after you've finished (when there is no more items in it) you try and iterate over it again in the following list comprehension.

The third piece of code creates a new iterator every time you do a for-loop on the all_eq_data object (that is because it is an iterable and not and iterator) (the csv.reader is an iterator).

In conclusion - the second code operates on an iterator (which burns out after the first use) and the second one operates on an iterable (which returns a different object every time you loop over it). For more information read up on the differences between an iterator and an iterable.

Thank you :) I will be reading more about iterator, iterable and iteration — Shad0w, Sep 25 '20 at 20:58

Raghav Gupta · Answer 2 · 2020-09-26T06:40:52.760

The thing in which you got entangled here is due to iterator method next().

Now you need to understand a simple concept here. A for loop is made up of two things.

1. Iterator

2. Caller

The purpose of iterator is to create an iterating object while the caller method purpose is to iterate over the object (i.e. start calling the object). Now for each iterator, a method next() should be called. Once the method is called, the object's buffer is emptied and another next() method has to be called before iterating over the object!

It will become more clear from the example below :

Here's a simple list

a=[1,2,3,4]
b=iter(a)    // Creates an iterator object in memory

next(b)      // Starts calling the iterator object which only has one iteration till now
>> 1
next(b)
>> 2
next(b)
>> 3

And this is exactly what for loop does without having to explicitly make an iterator and call next() each time (It takes care of that by default for us!)

for i in range(1,4):  // For loop creates an iterator and calls it consecutively!
    print(i)

>> 1
>> 2
>> 3

In the Second Part of your block, you run a next(reader) which is a caller method for the iterator to iterate over the first row. When you call brightness = [float(row[2]) for row in reader], the buffer gets emptied for the first row and there is nothing left to iterate in the first row unless a next(reader) method is called which will fill the buffer with next row's data! That's why you get two corresponding empty lists for the first next(reader)

However, if you want to get the items using next(), I suggest you unpack the values :

with open("file.csv") as W_F:
    reader = csv.reader(W_F)
    header = next(reader)

    lon,lat,brightness=header

Just append the these unpacked values to a Dictionary or List.

Hope you understood !!

Yes it's not. There hasn't been a formal terminology acceptance for the `next()` function from the past decade. However, it certainly connects with the object orientation method call which I think it's important for people so they can understand the concept behind it. Maybe this comment will make "caller" an official terminology :-) — Raghav Gupta, Sep 25 '20 at 22:10

Python reading row using list-comprehension (csv and json files)

2 Answers2