I've been getting this weird error. I've already fixed it a bunch of times so I know exactly what it means: the row I am trying to access does not exist.
Fair enough.
When checking the file, the row exists. This means that either I am loading the file incorrectly (Which I doubt, an error would be thrown I suppose in this case), or something is very wrong.
I've made my vocabulary a nice class, and I initialize it at the start of my program. This is the init:
def __init__(self, vocab_path, training_path):
data = pd.read_csv(training_path, sep='|', escapechar='\\', encoding='utf-8')
self.training_dataset = data.drop(['0'], 1)
self.vocab_path = vocab_path
try:
self.vocab = pd.read_csv(vocab_path, sep='|', encoding='latin-1')
except: # This is to avoid errors when the vocab file does not exist
self.training_to_vocab()
self.vocab = pd.read_csv(vocab_path, sep='|', encoding='latin-1')
And this is the part that causes the bug:
new_words = []
for n, index_num in enumerate(batch_to_translate): # Batch to translate looks something like this [10,11,3,5,34]
row = self.vocab[self.vocab['index'] == index_num]['word']
try:
new_words.append(row.values[0]) # Right here throws the error
except IndexError as err:
print('->', index_num, '<- This is the index_num',)
print('->', row, '<- This is the row')
print(err)
quit(0)
Why is this happening? When initializing no error gets thrown. Obviously, I call training_to_vocab()
later in my program (To avoid relying on the except
, if training_to_vocab()
is not called the vocabulary might not have the words necessary to use batch_to_words
).
I am not sure what is happening here. Even a very small index_num
(ex. 6) cannot be found.
This is the start of my vocabulary.csv
.
index|word
-|hello
-|there
-|this
-|is
-|junk
EDIT
I think I know what's the problem. self.vocab['index'] == index_num
this is literally searching for that number in that column. Because my columns are all -
it'll never find it. Anyone knows a way to use pandas indexing to search?