0

I have two dataframes. First one:

import pandas as pd
a = [['xxx', 'admin'], ['yyy', 'admin,super admin'], ['zzz', 'guest,admin,superadmin']]
df1 = pd.DataFrame(a, columns=['user', 'groups'])

second one:

b = [['xxx', 'admin,super admin'], ['www', 'admin,super admin'], ['zzz', 'guest,superadmin']]
df2 = pd.DataFrame(b, columns=['user', 'groups'])

this is the first one:

 user  groups
0  xxx  admin
1  yyy  admin,super admin
2  zzz  guest,admin,superadmin

this is the second one:

 user  groups
0  xxx  admin,super admin
1  www  admin,super admin
2  zzz  guest,superadmin

I want to do two things:

  1. if the second one's user is not in the first one, then print out. like: www is not in the list

  2. if the user is in the list, but group is not equal then print out:
    like xxx user have more: super admin than the list
    zzz user has less: admin than the list.

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
LLY
  • 21
  • 5
  • There is ame rows in both DataFrames? Do you need compare both `DataFrame`s by rows - 1. from one df with 1. form second df, 2. from one df with 2. form second df ? – jezrael May 18 '19 at 13:12
  • if second one has the user that not in the first one , print it out. if user is in the list ,but groups are not equal print it out. – LLY May 18 '19 at 13:33
  • What have you tried so far? Don't expect people to write code for you. Tell us what you've tried and we try to find out why it didn't work and how you can fix it. http://idownvotedbecau.se/noattempt/ – Amir Shabani May 18 '19 at 15:55

1 Answers1

0

If there are same index values ina number of length in both DataFrame and need compare values per rows:

print (df1.index.equals(df2.index))
True

#compare rows for not equal
mask = df1['user'].ne(df2['user'])
#filter rows by mask and column user in df2
a = df2.loc[mask, 'user'].tolist()
print (a)
['www']

#join both DataFrames together
df1 = pd.concat([df1, df2], axis=1, keys=('a','b'))
df1.columns  = df1.columns.map('_'.join)
#filter only same user rows
df1 = df1[~mask]
#split columns by , ans convert to sets
df1['a'] = df1['a_groups'].apply(lambda x: set(x.split(',')))
df1['b'] = df1['b_groups'].apply(lambda x: set(x.split(',')))
#get difference of sets, join to strings with separator ,
df1['a_diff'] = [', '.join(x.difference(y)) for x, y in zip(df1['b'],df1['a'] )]
df1['b_diff'] = [', '.join(x.difference(y)) for x, y in zip(df1['a'],df1['b'] )]
print (df1)
  a_user                a_groups b_user           b_groups  \
0    xxx                   admin    xxx  admin,super admin   
2    zzz  guest,admin,superadmin    zzz   guest,superadmin   

                            a                     b       a_diff b_diff  
0                     {admin}  {admin, super admin}  super admin         
2  {admin, superadmin, guest}   {superadmin, guest}               admin   

#filter by casting set columns to boolean, empty sets are converted to False
b = df1.loc[df1['a_diff'].astype(bool), ['a_user','a_diff']]
print (b)
  a_user       a_diff
0    xxx  super admin

c = df1.loc[df1['b_diff'].astype(bool), ['a_user','b_diff']]
print (c)
  a_user b_diff
2    zzz  admin
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • compare = df1["user"].tolist() compare1 = df2["user"].tolist() print(list(set(compare1) - set(compare))) print('is not in the list') for the first one , I tried this – LLY May 18 '19 at 13:50
  • @LLY - `print (df1.index.equals(df2.index))` return `True` in your real data? – jezrael May 18 '19 at 13:54
  • I didn't try that , I tried your step one ,it works , mine also works , I didn't try your step two yet. looks pretty good. thanks for your help , I will test and let you know. – LLY May 18 '19 at 13:58
  • @LLY - If working first, then should be also second part of solution too. – jezrael May 18 '19 at 13:59
  • the index length is not the same – LLY May 19 '19 at 02:19
  • @LLY - So is possible change sample data with expecred output? E.g. second DataFrame have 8 rows, first 3. What is logic for check values? Because first is easy, only necessary change `mask = df1['user'].ne(df2['user'])` to `mask = df1['user'].isin(df2['user'])`. But in second is used concat for join DataFrame together, but if different length, then in last rows are appended NaNs. Only idea is use [cross join](https://stackoverflow.com/questions/13269890/cartesian-product-in-pandas) instead concat, but still not sure if need it. – jezrael May 19 '19 at 05:29