1

I need to concatenate a uid from uids column to each of the uids in the list of the friends column, as shown in the following example:

Given a pandas.DataFrame object A:

    uid friends
0   1   [10, 2, 1, 5]
1   2   [1, 2]
2   3   [5, 4]
3   4   [10, 5]
4   5   [1, 2, 5]

the desired output is:

    uid friends         in_edges
0   1   [10, 2, 1, 5]   [(1, 10), (1, 2), (1, 1), (1, 5)]
1   2   [1, 2]          [(2, 1), (2, 2)]
2   3   [5, 4]          [(3, 5), (3, 4)]
3   4   [10, 5]         [(4, 10), (4, 5)]
4   5   [1, 2, 5]       [(5, 1), (5, 2), (5, 5)]

I use the following code to achieve this outcome:

import numpy as np
import pandas as pd

A = pd.DataFrame(dict(uid=[1, 2, 3, 4, 5], friends=[[10, 2, 1, 5], [1, 2], [5, 4], [10, 5], [1, 2, 5]]))

A.loc[:, 'in_edges'] = A.loc[:, 'uid'].apply(lambda uid: [(uid, f) for f in A.loc[A.loc[:, 'uid']==uid, 'friends'].values[0]])

but it the A.loc[A.loc[:, 'uid']==uid, 'friends'] part looks kind of cumbersome to me, so I wondered if there is an easier way to accomplish this task?

Thanks in advance.

Michael
  • 2,167
  • 5
  • 23
  • 38

2 Answers2

2

You can use .apply() with axis=1 parameter:

df["in_edges"] = df[["uid", "friends"]].apply(
    lambda x: [(x["uid"], f) for f in x["friends"]], axis=1
)
print(df)

Prints:

   uid        friends                           in_edges
0    1  [10, 2, 1, 5]  [(1, 10), (1, 2), (1, 1), (1, 5)]
1    2         [1, 2]                   [(2, 1), (2, 2)]
2    3         [5, 4]                   [(3, 5), (3, 4)]
3    4        [10, 5]                  [(4, 10), (4, 5)]
4    5      [1, 2, 5]           [(5, 1), (5, 2), (5, 5)]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • One issue though with your solutions is that performance-wise using the `.loc[] ` is much better than the standard `[]` operator. Is there a way to use the `.loc[]` operator with your solution? – Michael Apr 09 '21 at 22:18
  • @MichaelSidoroff You can use `df.loc[:, ["uid", "friends"]].apply(...)` but I don't see point of it. – Andrej Kesely Apr 09 '21 at 22:23
  • The performance of `[]` may be x3 times poorer than the same implementation with the `.loc[]`, as is shown here: https://stackoverflow.com/a/65875826/4596078 – Michael Apr 11 '21 at 09:35
2

Why not try product

import itertools
A['in_edges'] = A.apply(lambda x : [*itertools.product([x['uid']], x['friends'])],axis=1)
A
Out[50]: 
   uid        friends                           in_edges
0    1  [10, 2, 1, 5]  [(1, 10), (1, 2), (1, 1), (1, 5)]
1    2         [1, 2]                   [(2, 1), (2, 2)]
2    3         [5, 4]                   [(3, 5), (3, 4)]
3    4        [10, 5]                  [(4, 10), (4, 5)]
4    5      [1, 2, 5]           [(5, 1), (5, 2), (5, 5)]
BENY
  • 317,841
  • 20
  • 164
  • 234