I have a dataframe where i'd like to add a column "exists" based on the item existing in another dataframe.
Using the isin function only answers back with 1 match based on that other dataframe. Same for a loc filter when i set the column i want to filter as index.
It just doesn't work as expected when i use a reference to a list or column of another DF like this:
table.loc[table.index.isin(tableOther['column']), : ]
In this case it only returns 1 item.
import pandas as pd
import numpy as np
# Source that i like to enrich with additional column
table = pd.read_csv('keywordsDataSource.csv', encoding='utf-8', delimiter=';', index_col='Keyword')
# Source to compare keywords against
tableSubject = pd.read_csv('subjectDataSource.csv', encoding='utf-8', names=["subjects"])
### This column based check only returns 1 - seemingly random - match ###
table.loc[table.index.isin(tableSubject['subjects']), : ]
--------------
######## also tried it like this:
# Source that i like to enrich with additional column
table = pd.read_csv('keywordsDataSource.csv', encoding='utf-8', delimiter=';')
# Source to compare keywords against
tableSubject = pd.read_csv('subjectDataSource.csv', encoding='utf-8', names=["subjects"])
mask = table['Keyword'].isin(tableSubject.subjects)
table[mask]
I've also tried using .query and turning the pd subject column to a list which ends with the same singular match result as above.
as the output is the same in all tries, I expect that it is something with the datasource..
Thank you for your thoughts!