I want to compare the values in pandas

Question

I have two dataframes. First one:

import pandas as pd
a = [['xxx', 'admin'], ['yyy', 'admin,super admin'], ['zzz', 'guest,admin,superadmin']]
df1 = pd.DataFrame(a, columns=['user', 'groups'])

second one:

b = [['xxx', 'admin,super admin'], ['www', 'admin,super admin'], ['zzz', 'guest,superadmin']]
df2 = pd.DataFrame(b, columns=['user', 'groups'])

this is the first one:

 user  groups
0  xxx  admin
1  yyy  admin,super admin
2  zzz  guest,admin,superadmin

this is the second one:

 user  groups
0  xxx  admin,super admin
1  www  admin,super admin
2  zzz  guest,superadmin

I want to do two things:

if the second one's user is not in the first one, then print out. like: www is not in the list
if the user is in the list, but group is not equal then print out:
like xxx user have more: super admin than the list
zzz user has less: admin than the list.

There is ame rows in both DataFrames? Do you need compare both `DataFrame`s by rows - 1. from one df with 1. form second df, 2. from one df with 2. form second df ? — jezrael, May 18 '19 at 13:12
if second one has the user that not in the first one , print it out. if user is in the list ,but groups are not equal print it out. — LLY, May 18 '19 at 13:33
What have you tried so far? Don't expect people to write code for you. Tell us what you've tried and we try to find out why it didn't work and how you can fix it. http://idownvotedbecau.se/noattempt/ — Amir Shabani, May 18 '19 at 15:55

jezrael · Accepted Answer · 2019-05-18T13:53:37.327

0

If there are same index values ina number of length in both DataFrame and need compare values per rows:

print (df1.index.equals(df2.index))
True

#compare rows for not equal
mask = df1['user'].ne(df2['user'])
#filter rows by mask and column user in df2
a = df2.loc[mask, 'user'].tolist()
print (a)
['www']

#join both DataFrames together
df1 = pd.concat([df1, df2], axis=1, keys=('a','b'))
df1.columns  = df1.columns.map('_'.join)
#filter only same user rows
df1 = df1[~mask]
#split columns by , ans convert to sets
df1['a'] = df1['a_groups'].apply(lambda x: set(x.split(',')))
df1['b'] = df1['b_groups'].apply(lambda x: set(x.split(',')))
#get difference of sets, join to strings with separator ,
df1['a_diff'] = [', '.join(x.difference(y)) for x, y in zip(df1['b'],df1['a'] )]
df1['b_diff'] = [', '.join(x.difference(y)) for x, y in zip(df1['a'],df1['b'] )]
print (df1)
  a_user                a_groups b_user           b_groups  \
0    xxx                   admin    xxx  admin,super admin   
2    zzz  guest,admin,superadmin    zzz   guest,superadmin   

                            a                     b       a_diff b_diff  
0                     {admin}  {admin, super admin}  super admin         
2  {admin, superadmin, guest}   {superadmin, guest}               admin

#filter by casting set columns to boolean, empty sets are converted to False
b = df1.loc[df1['a_diff'].astype(bool), ['a_user','a_diff']]
print (b)
  a_user       a_diff
0    xxx  super admin

c = df1.loc[df1['b_diff'].astype(bool), ['a_user','b_diff']]
print (c)
  a_user b_diff
2    zzz  admin

edited May 18 '19 at 13:53

answered May 18 '19 at 13:39

jezrael

822,522
95
1,334
1,252

compare = df1["user"].tolist() compare1 = df2["user"].tolist() print(list(set(compare1) - set(compare))) print('is not in the list') for the first one , I tried this – LLY May 18 '19 at 13:50
@LLY - `print (df1.index.equals(df2.index))` return `True` in your real data? – jezrael May 18 '19 at 13:54
I didn't try that , I tried your step one ,it works , mine also works , I didn't try your step two yet. looks pretty good. thanks for your help , I will test and let you know. – LLY May 18 '19 at 13:58
@LLY - If working first, then should be also second part of solution too. – jezrael May 18 '19 at 13:59
the index length is not the same – LLY May 19 '19 at 02:19
@LLY - So is possible change sample data with expecred output? E.g. second DataFrame have 8 rows, first 3. What is logic for check values? Because first is easy, only necessary change `mask = df1['user'].ne(df2['user'])` to `mask = df1['user'].isin(df2['user'])`. But in second is used concat for join DataFrame together, but if different length, then in last rows are appended NaNs. Only idea is use [cross join](https://stackoverflow.com/questions/13269890/cartesian-product-in-pandas) instead concat, but still not sure if need it. – jezrael May 19 '19 at 05:29

I want to compare the values in pandas

1 Answers1