0

I had asked a similar question earlier, but I'm looking for a different output.

Create a dataframe of permutations in pandas from list

My list is as follows:

aa = ['aa1', 'aa2', 'aa3', 'aa4', 'aa5']
bb = ['bb1', 'bb2', 'bb3', 'bb4', 'bb5']
cc = ['cc1', 'cc2', 'cc3', 'cc4', 'cc5']

Now I want to create a dataframe as follows:

aa    bb    cc
aa1   bb1   cc1
aa2   bb1   cc1
aa3   bb1   cc1
aa4   bb1   cc1
aa5   bb1   cc1
aa1   bb2   cc1
aa1   bb3   cc1
aa1   bb4   cc1
aa1   bb5   cc1
aa1   bb1   cc2
aa1   bb1   cc3
aa1   bb1   cc4
aa1   bb1   cc5

The previous suggestion I received was to use:

lists = [aa, bb, cc]
pd.DataFrame(list(itertools.product(*lists)), columns=['aa', 'bb', 'cc'])

Which gives me a cartesian product.

But this time, it's not quite what I'm looking for. I want the output to be exactly like the example output above. - So each element in the list, only appears once in each column, except for the first element, which is duplicated to fill the entire column.

Appreciate any help!

cs95
  • 379,657
  • 97
  • 704
  • 746
Kvothe
  • 1,341
  • 7
  • 20
  • 33
  • 1
    It's not clear how you get this output. – cs95 Oct 14 '17 at 07:44
  • Well, this output is a subset of the cartesian product solution. Essentially, it's taking each list, adding to a column, then appending the second list into the second column, but with a row offset which is equal to the length of the first list, etc. – Kvothe Oct 14 '17 at 07:53

1 Answers1

1

First construct the repeating parts:

index = pd.RangeIndex(len(aa) + len(bb) + len(cc))
df = pd.DataFrame({'aa':aa[0], 'bb':bb[0], 'cc':cc[0]}, index)

That gives you 15 copies of:

aa1   bb1   cc1

Then overwrite the varying parts:

df.aa[:len(aa)] = aa
df.bb[len(aa):len(aa)+len(bb)] = bb
df.cc[len(aa)+len(bb):] = cc

Which gives the desired output.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • Ah this is perfect! Any idea how to make it work with `n' number of columns without having to type each manually? – Kvothe Oct 16 '17 at 16:23
  • @Kvothe: Sure, just make a dict like `{'aa':aa, 'bb':bb, 'cc':cc}` and iterate over it (`.items()`) to do each of the operations. – John Zwinck Oct 17 '17 at 03:48