Pandas `index 0 is out of bounds for axis 0 with size 0`

Question

I've been getting this weird error. I've already fixed it a bunch of times so I know exactly what it means: the row I am trying to access does not exist.

Fair enough.

When checking the file, the row exists. This means that either I am loading the file incorrectly (Which I doubt, an error would be thrown I suppose in this case), or something is very wrong.

I've made my vocabulary a nice class, and I initialize it at the start of my program. This is the init:

def __init__(self, vocab_path, training_path):
    data = pd.read_csv(training_path, sep='|', escapechar='\\', encoding='utf-8')
    self.training_dataset = data.drop(['0'], 1)
    self.vocab_path = vocab_path
    try:
        self.vocab = pd.read_csv(vocab_path, sep='|', encoding='latin-1')
    except: # This is to avoid errors when the vocab file does not exist
        self.training_to_vocab()
        self.vocab = pd.read_csv(vocab_path, sep='|', encoding='latin-1')

And this is the part that causes the bug:

new_words = []
for n, index_num in enumerate(batch_to_translate): # Batch to translate looks something like this [10,11,3,5,34]
    row = self.vocab[self.vocab['index'] == index_num]['word']
    try:
        new_words.append(row.values[0]) # Right here throws the error
    except IndexError as err:
        print('->', index_num, '<- This is the index_num',)
        print('->', row, '<- This is the row')
        print(err)
        quit(0)

Why is this happening? When initializing no error gets thrown. Obviously, I call training_to_vocab() later in my program (To avoid relying on the except, if training_to_vocab() is not called the vocabulary might not have the words necessary to use batch_to_words).

I am not sure what is happening here. Even a very small index_num (ex. 6) cannot be found.

This is the start of my vocabulary.csv.

index|word
-|hello
-|there
-|this 
-|is
-|junk

EDIT

I think I know what's the problem. self.vocab['index'] == index_num this is literally searching for that number in that column. Because my columns are all - it'll never find it. Anyone knows a way to use pandas indexing to search?

Can you show some examples of what is expected in the csv file? — Caio Belfort, Jun 16 '18 at 15:29
Is `row` empty when it throws the error? This would indicate that there are no indices in self.vocab that match, you might need to add an exception for such cases. — ALollz, Jun 16 '18 at 15:34
Yes, row is `[]`. It doesn't make any sense though. Index 6 in the file is not empy. — G. Ramistella, Jun 16 '18 at 15:35
If you're literally using your example .csv, there's no index of 6, because Pandas starts with zero indexing like python. https://stackoverflow.com/questions/32249960/in-python-pandas-start-row-index-from-1-instead-of-zero-without-creating-additi — Jeff Ellen, Jun 16 '18 at 15:36
That is the 'start'. My actual vocab has 1000+ entries formatted in the same way as the example. — G. Ramistella, Jun 16 '18 at 15:38
Can you show the output of `self.vocab['index'].sort_values().tolist()` — ALollz, Jun 16 '18 at 15:41
Here it is "['-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-',....." — G. Ramistella, Jun 16 '18 at 15:43
Well that seems to be the problem. Do you perhaps want to compare it to the actual dataframe index, and not that column which seems to be completely useless — ALollz, Jun 16 '18 at 16:32

Pandas `index 0 is out of bounds for axis 0 with size 0`

EDIT

0 Answers0