0

I have a pandas data frame with 2 columns - user1 and user2 something like this

Now, I want to do a transitive relation such that if A is related to B and B is to C and C is to D, then I want the output as a list like "A-B-C-D" in one group and "E-F-G" in another group.

Thanks

André Kool
  • 4,880
  • 12
  • 34
  • 44
  • 2
    Please do not vandalize your posts. If you believe your question is not useful or is no longer useful, it should be *deleted* instead of editing out all of the data that actually makes it a question. By posting on the Stack Exchange network, you've granted a non-revocable right for SE to distribute that content (under the CC BY-SA 3.0 license). By SE policy, any vandalism will be reverted. – Filnor Jun 21 '18 at 08:43
  • Wow, wanted to see how fast you detect this. You guys are good! – Naman Doshi Jun 21 '18 at 08:47

2 Answers2

0

If you have just 2 groups, you can do in this way. But it works only for 2 groups, and you cannot generalize:

x = []
y = []
x.append(df['user1'][0])
x.append(df['user2'][0])

for index, i in enumerate(df['user1']):
    if df['user1'][index] in x:
        x.append(df['user2'][index])
    else:
        y.append(df['user1'][index])
        y.append(df['user2'][index])
x = set(x)
y = set(y)
Joe
  • 12,057
  • 5
  • 39
  • 55
0

If you want to find all the transitive relationships then most likely you need to perform a recursion. Perhaps this following piece of code may help:

import pandas as pd
data={'user1':['A','A','B', 'C', 'E', 'F'],
      'user2':['B', 'C','C','D','F','G']}
df=pd.DataFrame(data)

print(df)

# this method is similar to the commnon table expression (CTE) in SQL                                                                    
def cte(df_anchor,df_ref,level):
    if (level==0):
        df_anchor.insert(0, 'user_root',df_anchor['user1'])
        df_anchor['level']=0
        df_anchor['relationship']=df_anchor['user1']+'-'+df_anchor['user2']
        _df_anchor=df_anchor
    if (level>0):
        _df_anchor=df_anchor[df_anchor.level==level]
    _df=pd.merge(_df_anchor, df_ref , left_on='user2', right_on='user1', how='inner', suffixes=('', '_x'))

    if not(_df.empty):
        _df['relationship']=_df['relationship']+'-'+_df['user2_x']
        _df['level']=_df['level']+1
        _df=_df[['user_root','user1_x', 'user2_x', 'level','relationship']].rename(columns={'user1_x': 'user1', 'user2_x': 'user2'})
        df_anchor_new=pd.concat([df_anchor, _df])
        return cte(df_anchor_new, df_ref, level+1)
    else:
        return df_anchor

df_rel=cte(df, df, 0)
print("\nall relationship=\n",df_rel)

print("\nall relationship related to A=\n", df_rel[df_rel.user_root=='A'])

 user1 user2
0     A     B
1     A     C
2     B     C
3     C     D
4     E     F
5     F     G

all relationship=
   user_root user1 user2  level relationship
0         A     A     B      0          A-B
1         A     A     C      0          A-C
2         B     B     C      0          B-C
3         C     C     D      0          C-D
4         E     E     F      0          E-F
5         F     F     G      0          F-G
0         A     B     C      1        A-B-C
1         A     C     D      1        A-C-D
2         B     C     D      1        B-C-D
3         E     F     G      1        E-F-G
0         A     C     D      2      A-B-C-D

all relationship related to A=
   user_root user1 user2  level relationship
0         A     A     B      0          A-B
1         A     A     C      0          A-C
0         A     B     C      1        A-B-C
1         A     C     D      1        A-C-D
0         A     C     D      2      A-B-C-D
DTT
  • 39
  • 5