Compare values from two pandas data frames, order-independent

Question

I am new to data science. I want to check which elements from one data frame exist in another data frame, e.g.

df1 = [1,2,8,6]
df2 = [5,2,6,9]

# for 1 output should be False

# for 2 output should be True

# for 6 output should be True

etc.

Note: I have matrix not vector.

I have tried using the following code:

import pandas as pd
import numpy as np

    priority_dataframe = pd.read_excel(prioritylist_file_path, sheet_name='Sheet1', index=None)

    priority_dict = {column: np.array(priority_dataframe[column].dropna(axis=0, how='all').str.lower()) for column in
                         priority_dataframe.columns}
    keys_found_per_sheet = []
    if file_path.lower().endswith(('.csv')):
        file_dataframe = pd.read_csv(file_path)
    else:
        file_dataframe = pd.read_excel(file_path, sheet_name=sheet, index=None)

    file_cell_array = list()
    for column in file_dataframe.columns:
        for file_cell in np.array(file_dataframe[column].dropna(axis=0, how='all')):
            if isinstance(file_cell, str) == 'str':
                file_cell_array.append(file_cell)
            else:
                file_cell_array.append(str(file_cell))

    converted_file_cell_array = np.array(file_cell_array)

    for key, values in priority_dict.items():
        for priority_cell in values:
            if priority_cell in converted_file_cell_array[:]:
                keys_found_per_sheet.append(key)
                break

I am doing something wrong in if priority_cell in converted_file_cell_array[:] ?

Is there any other efficient way to do that?

Can you add some data samples and expected output? I think [mcve](https://stackoverflow.com/help/mcve) — jezrael, Apr 03 '18 at 06:04
Possible duplicate of [Confirming equality of two pandas dataframes?](https://stackoverflow.com/questions/38212697/confirming-equality-of-two-pandas-dataframes) — Jared Wilber, Apr 03 '18 at 06:08
@JaredWilber ,not really because, i want to check existence of each element of one data frame into another data frame. — Piyush S. Wanare, Apr 03 '18 at 06:11
In other words, you want to check if two dataframes have exactly the same elements, but the positions do not matter, right? — DYZ, Apr 03 '18 at 06:15
My bad. I think you should further clarify the question, I'm still confused what you're asking. — Jared Wilber, Apr 03 '18 at 06:15
@DyZ, sorry , I want to check which element from one data frame exist in another data frame. — Piyush S. Wanare, Apr 03 '18 at 06:18
@ Dyz, anywhere at any position, I have updated my question please check. — Piyush S. Wanare, Apr 03 '18 at 06:23
You can take the `.values` from each dataframe, convert them to a `set()`, and take the set difference (subtract the sets). — DYZ, Apr 03 '18 at 06:27
@PiyushS.Wanare - There are same types of values? Is possible use `df1.isin(df2.values.ravel())` ? — jezrael, Apr 03 '18 at 06:28
@DyZ, I thing this could be the ans can you post answer with example. — Piyush S. Wanare, Apr 03 '18 at 06:34
**You say "dataframe" but you show a 1D dataframe/Series, then you say "Note: I have matrix not vector". Which is it?** And `df1 = [1,2,8,6]` is a plain Python list, not any of those. Please give executable code (MCVE). — smci, Aug 13 '19 at 09:35

jezrael · Answer 1 · 2018-04-03T06:59:23.447

2

You can flatten all values of DataFrames by numpy.ravel and then use set.intersection():

df1 = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df1)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

df2 = pd.DataFrame({'A':[2,3,13,4], 'Z':list('abfr')})
print (df2)
    A  Z
0   2  a
1   3  b
2  13  f
3   4  r

L = list(set(df1.values.ravel()).intersection(df2.values.ravel()))
print (L)
['f', 2, 3, 4, 'a', 'b']

edited Apr 03 '18 at 06:59

answered Apr 03 '18 at 06:32

jezrael

822,522
95
1,334
1,252

I have already check this, but I want True/False output for each element so that I can do other stuff on that . – Piyush S. Wanare Apr 03 '18 at 06:33
@PiyushS.Wanare - What is expected output? `dictionary` of boolens is correct? – jezrael Apr 03 '18 at 06:40
Want list of all existing element. – Piyush S. Wanare Apr 03 '18 at 06:49
1

@PiyushS.Wanare - then `list(set(a).intersection(b))` should working. – jezrael Apr 03 '18 at 06:51

DYZ · Accepted Answer · 2018-04-03T07:00:35.573

2

You can take the .values from each dataframe, convert them to a set(), and take the set intersection.

set1 = set(df1.values.reshape(-1).tolist())
set2 = set(dr2.values.reshape(-1).tolist())
different = set1 & set2

edited Apr 03 '18 at 07:00

answered Apr 03 '18 at 06:39

DYZ

55,249
10
64
93

Got error `AttributeError: 'builtin_function_or_method' object has no attribute 'reshape'`. – Piyush S. Wanare Apr 03 '18 at 06:48
Is `df1` indeed a DataFrame? – DYZ Apr 03 '18 at 06:54
soryy it was my mistake. – Piyush S. Wanare Apr 03 '18 at 06:54
I want to know what are the element from set1 exist in set2. Will set difference work, I don't think so. – Piyush S. Wanare Apr 03 '18 at 06:59
@PiyushS.Wanare - Or use function `set.intersection`, then second list is not necessary convert to set. – jezrael Apr 03 '18 at 07:03

Compare values from two pandas data frames, order-independent

2 Answers2