0

EDIT: I may have not been clear enough, so I'm adding an image that hopefully clears things up.

https://ibb.co/BftNM9K The Database is given, and the column Josh too. I'd like to get a list/Series/DataFrame that contains all of Josh values except 75, 910, 306, 561 (which are all values contained in the Database.

pardon if this is a dumb question. I'm an electrical engineer and I'm trying Python to write a little snippet.

I have a DataFrame db that can be made up of tens of columns and another DataFrame df that's made up of a single column.

I'd like to check every single item in df and make sure it's present at some point in db (doesn't matter where). If it's not, I'd like to store that "missing" item in a completely new DataFrame (or Series whatever).

Here's what I'm testing right now. I tried different approaches to check for item presence, like:

  • db[db.eq(new_lib_ref.iloc[i]).any(1)]
    # add libraries
    import pandas as pd
    import ntpath
    
    
    data = [['tom'], ['nick'], ['juli']]
    # Create the pandas DataFrame
    df = pd.DataFrame(data, columns=['Name'])
    
    data = [['susan', 'peter'], ['tom', 'ollie'], ['jack', 'nick']]
    # Create the pandas DataFrame
    db = pd.DataFrame(data, columns=['Name', '2nd Name'])
    
    # create new df to store all new components not yet in the db
    new_comp = pd.DataFrame()
    
    print(df.iloc[0])
    print(type(df.iloc[0]))
    print(db)
    print(type(db))
    df.reset_index(inplace=True,drop=True)
    db.reset_index(inplace=True,drop=True)
    df.sort_index(inplace=True)
    db.sort_index(inplace=True)
    
    # print(db.shape[0])
    # print(db.shape[1])
    
    for i in range(len(df)):
        #print(i)
        for j in range(db.shape[0]):
            #print(j)
            for k in range(db.shape[1]):
                lib_ref_present = (db.iloc[j,k]) == df.iloc[i]
                if lib_ref_present.bool():
                    break
                else:
                    print(df.iloc[i])
                    #pd.concat(new_comp,df.iloc[i])
                    new_comp.append(df.iloc[i])
    
    # remove duplicates from new_comp
    print(new_comp)
danysanca
  • 1
  • 1
  • something went wrong with formatting the code, I'm trying to edit it – danysanca Sep 18 '20 at 08:35
  • For every value in df, I want to check if it's present in db – danysanca Sep 18 '20 at 08:50
  • IIUC, you need `df.loc[~df['Name'].isin(db.stack()),'Check'] = df['Name']` failing that please provide a sample of your expected output – Umar.H Sep 18 '20 at 08:53
  • expected output: Name Check 0 juli – danysanca Sep 18 '20 at 08:59
  • I'm having some trouble with formatting. Basically the output should only contain juli because it's the only value not present in the db. Your code instead returns a 2x3 dataframe – danysanca Sep 18 '20 at 09:01
  • you can use the boolean as a filter eg `df[~df['Name'].isin(db.stack())]` – Umar.H Sep 18 '20 at 09:05
  • Does this answer your question? [How to filter Pandas dataframe using 'in' and 'not in' like in SQL](https://stackoverflow.com/questions/19960077/how-to-filter-pandas-dataframe-using-in-and-not-in-like-in-sql) – Umar.H Sep 18 '20 at 09:10
  • It doesn't. I tried (df.iloc[0]).isin(db) but it returns False, when it should return True. I'm not sure why that happens. Also I don't know what SQL is. – danysanca Sep 18 '20 at 09:44
  • I just realized that the proposed solution in the questions linked doesn't work because they're comparing 2 lists. What I need to do is to compare a matrix with every element in a list. – danysanca Sep 18 '20 at 10:31

0 Answers0