From n x k DataFrame, generate a (n over 2) x 2k DataFrame of all pairs of rows

Question

Given a pandas DataFrame x of dimensions n x k, how can we efficiently generate a DataFrame y of dimensions (n over 2) x 2k, whose rows are all possible combinations of rows pairs of rows from x? For example, if x is

[[1 11],
 [2,22],
 [3,33],
 [4,44]]

then y should be

[[1,11,2,22],
 [1,11,3,33],
 [1,11,4,44],
 [2,22,3,33],
 [2,22,4,44],
 [3,33,4,44]]

score 2 · Accepted Answer · answered Jun 02 '20 at 21:41

2

We can try combinations

from itertools import combinations
[*map(lambda x : sum(x,[]),combinations(l,r=2))]
Out[80]: 
[[1, 11, 2, 22],
 [1, 11, 3, 33],
 [1, 11, 4, 44],
 [2, 22, 3, 33],
 [2, 22, 4, 44],
 [3, 33, 4, 44]]

answered Jun 02 '20 at 21:41

BENY

317,841
20
164
234

You probably meant `[*map(lambda x: sum(x,[]), itertools.combinations(x.values.tolist(),r=2))]`? Thank you! : ) – Leo Jun 02 '20 at 22:15

wwnde · Answer 2 · 2020-06-03T09:30:51.343

1

My attempt

l=[[1,11], [2,22], [3,33], [4,44]]

Full list

#lst=[x+y for x in [z for z in l[:3]] for y in [z for z in l[1:]] if x!=y]#Use + in list comprehension

If you wanted to eliminate [3, 33, 2, 22]. initialize a new list and append x+y only if y+x doesn't exist.

k=[]
lst=[k.append(x+y) for x in [z for z in l[:3]] for y in [z for z in l[1:]] if x!=y if y+x not in k]
print(k)

edited Jun 03 '20 at 09:30

answered Jun 02 '20 at 22:44

wwnde

26,119
6
18
32

score 0 · Answer 3 · answered Jun 04 '20 at 00:18

By modifying Bharath's answer here, I produced a solution:

n=4; x=pandas.DataFrame([[i,11*i] for i in range(1,n+1)],columns=['A','B'])
cnct=( lambda l,i=0: pandas.concat(l,axis=i) )
z=cnct([ cnct([x.iloc[:i] for i in range(n)]).sort_index().reset_index(drop=True), 
         cnct([x.iloc[i+1:] for i in range(n)]).reset_index(drop=True) ], 1)

For n=10**4, it outperforms the itertools solution.

From n x k DataFrame, generate a (n over 2) x 2k DataFrame of all pairs of rows

3 Answers3