I am practicing regression using the IMDB 5000+ movie meta-data set at kaggle. I am using the pandas library to read the csv file and convert that data to a nested list. I get a list named movie_data.
I want to delete movie_data[n]
row where movie_data[n][0] != 'Color'
. So I try to deletion through for loop, but this code occurs at i == 4827
:
IndexError: list index out of range
Here is my code:
import tensorflow as tf
import numpy as np
import pandas as pd
tf.set_random_seed(777)
read = pd.read_csv('movie_metadata.csv', sep=',')
movie_data = read.values.tolist()
gross_data = []
for i in range(len(movie_data)):
gross_data.append(movie_data[i][8])
#delete gross row
for row in movie_data:
del row[8]
#remove not-colored (e.g. black and white) movie datas
for i in range(len(movie_data)):
print(i)
if movie_data[i][0] != 'Color':
del movie_data[i]
training_movie_data = movie_data[0:3500]
training_gross_data = gross_data[0:3500]
#print(training_movie_data)
Error occurs at line 20 : if movie_data[i][0] != 'Color'
How can I fix this?