IndexError: invalid index

Question

I try to read from a dataset and I want all elements except the last one in train. I get the last element as target. I can print it and all good but when the code reaches train = ... then I get this error: IndexError: invalid index

dataset = np.genfromtxt(open(train_file,'r'), delimiter=',',dtype=None)[1:]
target = [x[401] for x in dataset]
train = [x[0:400] for x in dataset]

I also tried: [x[:-1] for x in dataset] but I get the same error.

Data set is big but this is a sample:

xxx,-0.011451,-0.070532,...,-0.011451,-0.070532,O

Because of want the first 401 elements of all elements in dataset. Dataset is the array of lists. — Nick, Apr 17 '15 at 23:55
could it be that when you've gotten to `train` you're at the end of the file, so `x` is `None`? — abcd, Apr 17 '15 at 23:57
You haven't provided any information about `dataset`, `genfromtxt()`, or `train_file`. Any answers will just be guesses, trying to bruteforce the solution. — TigerhawkT3, Apr 17 '15 at 23:58
Perhaps it would help if you found a different wording instead of "once hit the train will pop out this error" because that makes absolutely zero sense. You might try a longer code sample and a traceback. — Paul Cornelius, Apr 18 '15 at 00:01
@TigerhawkT3 [genfromtxt()](http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html) is part of NumPy. — Tutleman, Apr 18 '15 at 00:12
I see that now... after it was mentioned in a comment on an answer. That sort of information should be present in the question's text and/or a tag to encourage useful answers. — TigerhawkT3, Apr 18 '15 at 00:18
It would be strange that target would work and train wouldn't, but have you checked this just to see if the lengths are all as expected? for x in dataset: len(x) — sage88, Apr 18 '15 at 00:31
Can you get it to work using numpy access syntax: train = dataset[0:len(dataset)][0:400] — sage88, Apr 18 '15 at 00:41
Also, just so you know, you have an off by 1 error in train. It should be: train = [x[0:401] for x in dataset] — sage88, Apr 18 '15 at 00:58

Tutleman · Answer 1 · 2015-04-18T00:05:29.447

Your issue appears to be with understanding how list comprehensions work, and when you might want to use one.

A list comprehension goes through every item in an list, applies a function to it, and may or may not filter out other elements. For instance, if I had the following list:

digits = [1, 2, 3, 4, 5, 6, 7]

And I used the following list comprehension:

squares = [i * i for i in digits]

I would get: [1, 4, 9, 16, 25, 36, 49]

I could also do something like this:

even_squares = [i * i for i in digits if i % 2 == 0]

Which would give me: [4, 16, 36]

Now let's talk about your list comprehensions in particular. You wrote [x[401] for x in dataset], which, in English, reads as "a list containing the 401st element of each item in the list called dataset".

Now, in all likelihood, there aren't more than 402 items in each line of your dataset, meaning that, when you try to access the 401st element of each, you get an error.

It sounds like you're just trying to get all the elements in dataset excluding the last one. To do that, you can use python's slice notation. If you write dataset[:-1], you'll get all items in the dataset other than the last one. Similarly, if you wrote dataset[:-2], you'd get all items except for the last two, and so on. The same works if you want to cut off the front of the list: dataset[1:-1] will give you all items in the list excluding the 0th and last items.

Edit: Now that I see the new comments on your post, it's clear that you are trying to get the first 401 elements of each item in the dataset. Unfortunately, because we don't know anything about your dataset, it's impossible to say what exactly the issue is.

You got me wrong. I have 402 items in each line. I can pass and get `[x[401] for x in dataset]` with no problem. When I do `dataset[:-1]`, I get that error. — Nick, Apr 18 '15 at 00:05
@Nick Can you give us some more information? A good place to start would be the rest of the error message. — Tutleman, Apr 18 '15 at 00:07
There is no rest of error message. It will give me this error and exit. I guess might get something to do with numpy. I also added a sample of my data but it is too big to copy here. — Nick, Apr 18 '15 at 00:09
@Nick It is possible that your columns have different dtypes? That could cause this issue. — Tutleman, Apr 18 '15 at 00:13
Well, my columns are all number except the last one. The first one is integer (0 or 1 or 2), the next 400 are negative and positive float numbers and the last one is string. — Nick, Apr 18 '15 at 00:37
@Nick I bet that's your problem. See [this question](http://stackoverflow.com/questions/7093431/numpy-array-column-slicing-produces-indexerror-invalid-index-exception). — Tutleman, Apr 18 '15 at 04:44

score 1 · Accepted Answer · answered Apr 18 '15 at 00:19

1

I just tested this with the following toy code. Your syntax is actually correct. Something is wrong with your input file, not with the way you are selecting elements from your list of arrays.

from numpy import *

a = array(range(1,403))

dataset = []
for i in range(5):
    dataset.append(a)

target = [x[401] for x in dataset]
train = [x[0:400] for x in dataset]

answered Apr 18 '15 at 00:19

sage88

4,104
4
31
41

The strange thing is that when I test data with `dataset = list(csv.reader(open(train_file, 'rU')))` I don't get any error. It is strange. I have another dataset that works well with np.genfromtxt – Nick Apr 18 '15 at 00:31

IndexError: invalid index

2 Answers2