0

I know there are similar problems and solutions in here, but I dont seem to find the exact solution.

Wanted to find rows with "all but one" column similar.

So,

     ColumnA     ColumnB     ColumnC    ColumnD  ColumnE  
1      John        Texas       USA        115       5
2      Mike        Florida     USA        66        1
3      John        Texas       USA        115       4
4      Justin      NewYork     USA        22        11

So the logic im trying to get is:

for every entry in the dataframe:
       if there exists "another" entry with all Columns similar, apart from ColumnE
        AND
       the value of ColumnE in First entry found "MINUS" the value of ColumnE in second entry found is "LESS" than "1":
                   Then append the entry to a new DataFrame

So far, I have used df.loc and df.duplicated to get somewhere there. The problem and data is a little more complicated so I would be able to post the code here.

Any help with this would be super appreciated.

Thanks, Rob

  • Please have provide a sample of what you WANT the table to look like after it is processed. Read [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) article on how to post a good reproducible question. – Ukrainian-serge Mar 08 '20 at 01:53

1 Answers1

0

So I'm not sure exactly what format you want your result in so I made a dictionary where the key is the index of a given row and the value is a list of indices for rows that differ by exactly 1 entry...

def ndif(a,b):
    d = 0
    for x,y in zip(a,b):
            if x!=y:
                    d+=1
    return(d)

d = pd.DataFrame([[1,2,3],[1,2,4],[3,2,4],[3,0,4],[5,0,3]])

just1 = {}

for k in d.index:
    just1[k] = [k[0] for k in d.apply(ndif,args=[d.iloc[k]],axis=1).items() if k[1]==1]
kpie
  • 9,588
  • 5
  • 28
  • 50