Using Pandas, I have two data sets stored in two separate dataframes. Each dataframe is composed of two series.
The first dataframe has a series called 'name', the second series is a list of strings. It looks something like this:
name attributes
0 John [ABC, DEF, GHI, JKL, MNO, PQR, STU]
1 Mike [EUD, DBS, QMD, ABC, GHI]
2 Jane [JKL, EJD, MDE, MNO, DEF, ABC]
3 Kevin [FHE, EUD, GHI, MNO, ABC, AUE, HSG, PEO]
4 Stefanie [STU, EJD, DUE]
The second dataframe is similar with the first series being
username attr
0 username_1 [DHD, EOA, AUE, CHE, ABC, PQR, QJF]
1 username_2 [ABC, EKR, ADT, GHI, JKL, EJD, MNO, MDE]
2 username_3 [DSB, AOD, DEF, MNO, DEF, ABC, TAE]
3 username_4 [DJH, EUD, GHI, MNO, ABC, FHE]
4 username_5 [CHQ, ELT, ABC, DEF, GHI]
What I'm trying to achieve is to compare the attributes (second series) of each dataframe to see which names and usernames share the most attributes.
For example, username_4 has 5 out of 6 attributes matching those of Kevin's.
I thought of looping one of the attributes series and see if there's a match in each row of the other series but couldn't loop effectively (maybe because my lists don't have quotation marks around the strings?).
I don't really know what possibilities exist to compare those two series and end up with a result as mentioned above (username_4 has 5 out of 6 attributes matching those of Kevin's).
What would be the possible approach(es) here?