2

Given a pandas DataFrame x of dimensions n x k, how can we efficiently generate a DataFrame y of dimensions (n over 2) x 2k, whose rows are all possible combinations of rows pairs of rows from x? For example, if x is

[[1 11],
 [2,22],
 [3,33],
 [4,44]]

then y should be

[[1,11,2,22],
 [1,11,3,33],
 [1,11,4,44],
 [2,22,3,33],
 [2,22,4,44],
 [3,33,4,44]]
Leo
  • 772
  • 1
  • 6
  • 15

3 Answers3

2

We can try combinations

from itertools import combinations
[*map(lambda x : sum(x,[]),combinations(l,r=2))]
Out[80]: 
[[1, 11, 2, 22],
 [1, 11, 3, 33],
 [1, 11, 4, 44],
 [2, 22, 3, 33],
 [2, 22, 4, 44],
 [3, 33, 4, 44]]
BENY
  • 317,841
  • 20
  • 164
  • 234
  • You probably meant `[*map(lambda x: sum(x,[]), itertools.combinations(x.values.tolist(),r=2))]`? Thank you! : ) – Leo Jun 02 '20 at 22:15
1

My attempt

l=[[1,11], [2,22], [3,33], [4,44]]

Full list

#lst=[x+y for x in [z for z in l[:3]] for y in [z for z in l[1:]] if x!=y]#Use + in list comprehension

If you wanted to eliminate [3, 33, 2, 22]. initialize a new list and append x+y only if y+x doesn't exist.

k=[]
lst=[k.append(x+y) for x in [z for z in l[:3]] for y in [z for z in l[1:]] if x!=y if y+x not in k]
print(k)

enter image description here

wwnde
  • 26,119
  • 6
  • 18
  • 32
0

By modifying Bharath's answer here, I produced a solution:

n=4; x=pandas.DataFrame([[i,11*i] for i in range(1,n+1)],columns=['A','B'])
cnct=( lambda l,i=0: pandas.concat(l,axis=i) )
z=cnct([ cnct([x.iloc[:i] for i in range(n)]).sort_index().reset_index(drop=True), 
         cnct([x.iloc[i+1:] for i in range(n)]).reset_index(drop=True) ], 1)

For n=10**4, it outperforms the itertools solution.

Leo
  • 772
  • 1
  • 6
  • 15