Is there a more efficient way to get all rows that are related in some way to any other row in the same df (equality used in this example, actual function is a bit more complex):
import pandas as pd
from pydataset import data
df = data('iris')
df = df[df.index<10]
#adding ID col for ease of ref
df['ID'] = df.index
df
Sepal.Length Sepal.Width Petal.Length Petal.Width Species ID
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 2
3 4.7 3.2 1.3 0.2 setosa 3
4 4.6 3.1 1.5 0.2 setosa 4
5 5.0 3.6 1.4 0.2 setosa 5
6 5.4 3.9 1.7 0.4 setosa 6
7 4.6 3.4 1.4 0.3 setosa 7
All I can come up with is a df.copy()
and then loop:
df_copy = df.copy()
df_want = pd.DataFrame(columns=['ID','Sepal.Length','Sepal.Width','ExID', 'ExSepal.Length', 'ExSepal.Width'])
for row in range(0, df.shape[0]):
for row2 in range(0, df_copy.shape[0]):
if (df.iloc[row]['ID'] != df_copy.iloc[row2]['ID'] and
df.iloc[row]['Sepal.Length'] == df_copy.iloc[row2]['Sepal.Length']):
df_want = df_want.append({'ID':df.iloc[row]['ID'],
'Sepal.Length':df.iloc[row]['Sepal.Length'],
'Sepal.Width':df.iloc[row]['Sepal.Width'],
'ExID':df_copy.iloc[row2]['ID'],
'ExSepal.Length':df_copy.iloc[row2]['Sepal.Length'],
'ExSepal.Width':df_copy.iloc[row2]['Sepal.Width']},
ignore_index=True)
df_want
ID Sepal.Length Sepal.Width ExID ExSepal.Length ExSepal.Width
4.0 4.6 3.1 7.0 4.6 3.4
5.0 5.0 3.6 8.0 5.0 3.4
7.0 4.6 3.4 4.0 4.6 3.1
8.0 5.0 3.4 5.0 5.0 3.6
So row_ID
4 is the same as 7, and 5 is the same as 8, and so on..
Tried searching quite a bit, closest I found was this Select rows from a DataFrame based on values in a column in pandas but struggling to use isin()
in this case.