2

I am practicing regression using the IMDB 5000+ movie meta-data set at kaggle. I am using the pandas library to read the csv file and convert that data to a nested list. I get a list named movie_data.

I want to delete movie_data[n] row where movie_data[n][0] != 'Color'. So I try to deletion through for loop, but this code occurs at i == 4827:

IndexError: list index out of range

Here is my code:

import tensorflow as tf
import numpy as np
import pandas as pd 

tf.set_random_seed(777)

read = pd.read_csv('movie_metadata.csv', sep=',')
movie_data = read.values.tolist()
gross_data = []
for i in range(len(movie_data)):
    gross_data.append(movie_data[i][8])

#delete gross row
for row in movie_data:
    del row[8]

#remove not-colored (e.g. black and white) movie datas
for i in range(len(movie_data)):
    print(i)
    if movie_data[i][0] != 'Color':
        del movie_data[i]

training_movie_data = movie_data[0:3500]
training_gross_data = gross_data[0:3500]

#print(training_movie_data)

Error occurs at line 20 : if movie_data[i][0] != 'Color'

How can I fix this?

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Tart L.
  • 29
  • 1
  • 4
  • Can you do a print(movie_data[i]) before if movie_data[i][0] != 'Color':? – Allen Qin May 11 '17 at 04:31
  • 1
    Possible duplicate of [How to delete a column from a data frame with pandas?](http://stackoverflow.com/questions/28035839/how-to-delete-a-column-from-a-data-frame-with-pandas) – e4c5 May 11 '17 at 04:34
  • At first glance that may not look like a duplicate to you. But I assure you it is. You almost never loop through pandas row by row to modify the data – e4c5 May 11 '17 at 04:35
  • You should not remove list/array entries while iterating over that list/array. – Jan Christoph Terasa May 11 '17 at 04:48

2 Answers2

1

You shouldn't delete elements you're iterating over:

In [11]: A = [1, 2, 3]

In [12]: for i in range(len(A)):
    ...:     del A[i]
    ...:
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-12-1ffb9090e54f> in <module>()
      1 for i in range(len(A)):
----> 2     del A[i]
      3

IndexError: list assignment index out of range

and in this case:

In [21]: A = [1, 2, 3]

In [22]: for i in range(len(A)):
    ...:     print(A[i])
    ...:     del A[i]
    ...:
1
3
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-22-af7e1866dc89> in <module>()
      1 for i in range(len(A)):
----> 2     print(A[i]);del A[i]
      3
      4

IndexError: list index out of range

Which is what you're doing with del movie_data[i].

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
0

If you just want the non-color movies, you can use pandas and just say so as:

Code:

bw = read[read.color != 'Color']

Test Code:

read = pd.read_csv('movie_metadata.csv', sep=',')
bw = read[read.color != 'Color']
print(bw.head())

**Results:

                color    director_name  num_critic_for_reviews  duration 
4                 NaN      Doug Walker                     NaN       NaN   
111   Black and White      Michael Bay                   191.0     184.0   
149   Black and White     Lee Tamahori                   264.0     133.0   
257   Black and White  Martin Scorsese                   267.0     170.0   
272   Black and White     Michael Mann                   174.0     165.0   
....
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135