I have a dataframe with 2 columns: 'VENDOR_ID' and 'GL_Transaction_Description'. I want to print every row of the 'GL_Transaction_Description' column that has any value from the 'VENDOR_ID' column.
VENDOR_ID | GL_Transaction_Description |
---|---|
123 | HELLO 345 |
456 | BYE 456 |
987 | THANKS 456 |
The desired output here would be 'BYE 456' AND 'THANKS 456. My code is as such:
for k in range(len(df)):
for j in range(len(df)):
if df['VENDOR_ID'][k] in df['GL_Transaction_Description'][j] and df['VENDOR_ID'][k] != 'nan':
print(df['GL_Transaction_Description'][j])
But this particular dataframe counts more than 100k rows and it takes forever to run with a nested for loop. Any ideas on how to make this execute faster? I have read that using numpy usually makes things go blazingly faster but I haven't been able to implement it.