0
Id    1    2    3    4

z1    3    0    2    2 
z1    1    4    4    3
z2    8    1    7    9
z2    0    0    2    3
z2    5    6    7    9
z3    0    5    6    2
z3    4    4    8    2

Here is my data, I want to group every column into lists by Id, result should be this

Id    1      2      3      4

z1   [3,1]  [0,4]  [2,4]  [2,3]
z2 [8,0,5][1,0,6][7,2,7][9,3,9]
z3   [0,4]  [5,4]  [6,8]  [2,2]

So here is the thing I could do every column separately but I've done that and now I need to optimize this, is there any way for this to be done once for every column?? If not, maybe there is a way that works faster than pandas.groupby??

1 Answers1

0

First I think working with lists in pandas is not good idea.

But if really need it, it is possible by DataFrameGroupBy.agg with list:

df = df.groupby('Id').agg(list)
print (df)
            1          2          3          4
Id                                            
z1     [3, 1]     [0, 4]     [2, 4]     [2, 3]
z2  [8, 0, 5]  [1, 0, 6]  [7, 2, 7]  [9, 3, 9]
z3     [0, 4]     [5, 4]     [6, 8]     [2, 2]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • why is it a bad idea?? also can you point me in the right direction if it's no good?? – Mels Hakobyan Feb 05 '19 at 18:03
  • @Mels Hakobyan You can check link in answer. But all depends what you need. If small DataFrame to 1k rows it working nice. But problem is if large 100k+ rows DataFrame, it working, but next processing is obviously slow. But if performance is not important, then use solution with no problem. It is explined better now? – jezrael Feb 05 '19 at 18:32
  • sure, thank you, the performance is crutial in this case, I try to minimize running time as much as possible – Mels Hakobyan Feb 05 '19 at 18:49
  • is it possible for your code to run longer then if I grouped every column separately, cuz currently that's what I get. – Mels Hakobyan Feb 06 '19 at 10:38
  • @MelsHakobyan - What is size of input DataFrame? What size of output? What is number of rows? – jezrael Feb 06 '19 at 11:07
  • (4000000, 4) size of output is (5000, 4) – Mels Hakobyan Feb 06 '19 at 18:11