0

Have a pandas dataframe, want to delete a row on equalizing with some value. Get a 'the label [some integer] is not in the [index]' error

while i < 881:
    ctr=0
    sent=df1.loc[i,"text"]
    print ("SENTENCE:",i,sent)
    for j in range(i+1,len(df1)):
        to_compare=df1.loc[j,"text"]
        sim=similar(sent,to_compare)
        if sim>0.8:
            print ("SIMILAR:",j,to_compare)
            ctr+=1
            df1=df1.drop(j)
            df1=df1.reset_index(drop=True)
        else : 
            i +=1
    print (ctr)

same error with for loop

for i in range(10):
    ctr=0
    sent=df1.loc[i,"text"]
    print ("SENTENCE:",i,sent)
    for j in range(i+1,len(df1)):
        to_compare=df1.loc[j,"text"]
        sim=similar(sent,to_compare)
        if sim>0.8:
            print ("SIMILAR:",j,to_compare)
            ctr+=1
            df1=df1.drop(j)
            df1=df1.reset_index(drop=True)
    print (ctr)
  • 1
    Welcome to stackoverflow. Its easier for us to read data then to read your code. So try to add example data and what your expected output is. Read more [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) on how to make a good pandas question – Erfan May 10 '19 at 22:52

1 Answers1

0
range(i+1,len(df1))

creates an iterator that is not updated when len(df1) changes, thus after dropping lines and reindexing, in

to_compare=df1.loc[j,"text"]

you are passing an index that does not longer exist. An easy fix should be to let the inner loop finish before you reindex.

biko68
  • 1
  • 1