EDIT: I may have not been clear enough, so I'm adding an image that hopefully clears things up.
https://ibb.co/BftNM9K The Database is given, and the column Josh too. I'd like to get a list/Series/DataFrame that contains all of Josh values except 75, 910, 306, 561 (which are all values contained in the Database.
pardon if this is a dumb question. I'm an electrical engineer and I'm trying Python to write a little snippet.
I have a DataFrame db that can be made up of tens of columns and another DataFrame df that's made up of a single column.
I'd like to check every single item in df and make sure it's present at some point in db (doesn't matter where). If it's not, I'd like to store that "missing" item in a completely new DataFrame (or Series whatever).
Here's what I'm testing right now. I tried different approaches to check for item presence, like:
- db[db.eq(new_lib_ref.iloc[i]).any(1)]
# add libraries
import pandas as pd
import ntpath
data = [['tom'], ['nick'], ['juli']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name'])
data = [['susan', 'peter'], ['tom', 'ollie'], ['jack', 'nick']]
# Create the pandas DataFrame
db = pd.DataFrame(data, columns=['Name', '2nd Name'])
# create new df to store all new components not yet in the db
new_comp = pd.DataFrame()
print(df.iloc[0])
print(type(df.iloc[0]))
print(db)
print(type(db))
df.reset_index(inplace=True,drop=True)
db.reset_index(inplace=True,drop=True)
df.sort_index(inplace=True)
db.sort_index(inplace=True)
# print(db.shape[0])
# print(db.shape[1])
for i in range(len(df)):
#print(i)
for j in range(db.shape[0]):
#print(j)
for k in range(db.shape[1]):
lib_ref_present = (db.iloc[j,k]) == df.iloc[i]
if lib_ref_present.bool():
break
else:
print(df.iloc[i])
#pd.concat(new_comp,df.iloc[i])
new_comp.append(df.iloc[i])
# remove duplicates from new_comp
print(new_comp)