Sorting columns having lists with reference to each other in pandas

Question

Suppose my data frame is like:

   A             B           Date
[1,3,2]    ['a','b','c']     date1

I want to sort both the columns but with reference to each other. Like the output should be:

   A             B           Date
[1,2,3]    ['a','c','b']     date1

Now if these had been two lists only I would have sorted through zip method.

But as these are columns of data frame. I am not sure how to use apply method to sort these with reference to each other.

My dataframe as a whole is sorted on the basis of third column (date). Now for each such dates the other two columns have list, each having same number of values. I want to sort those lists on the basis of each other

Are lists in all rows equally sized (i..e, size 3 in this case)? — cs95, May 30 '20 at 23:14

score 2 · Accepted Answer · answered May 30 '20 at 23:21

2

If all cells have the same number of values, try this flattening and groupby approach:

df
           A          B
0  [1, 3, 2]  [a, b, c]
1  [4, 6, 5]  [d, f, e]  # added an extra row for demonstration

(df.apply(pd.Series.explode)
   .groupby(level=0)
   .apply(lambda x: x.sort_values('A'))
   .groupby(level=0)
   .agg(list))

           A          B
0  [1, 2, 3]  [a, c, b]
1  [4, 5, 6]  [d, e, f]

Obligatory disclaimer: Please don't use pandas to store lists in columns.

answered May 30 '20 at 23:21

cs95

379,657
97
704
746

The source data was not in such format. It had a ambiguous date column and I wanted to plot the data through animation. But there were 3k+ rows so it was processing very slowly because of ambiguous rows. TO solve this I used group by on date column and merged the values of other two columns. Now there are 112 rows only – EXODIA May 30 '20 at 23:32
@EXODIA Thanks for the context but I don't know if that matters? Would this solution not work on the 112 row df that you now have? I did ask you if your cells were all equally sized (== 3). – cs95 May 30 '20 at 23:34
Yes they are, I replied back on the part where you mentioned 'Please don't use pandas to store lists in columns'. I am trying to apply the solution now. – EXODIA May 30 '20 at 23:36
What does ```level=0``` indicate here ? – EXODIA May 30 '20 at 23:38
@EXODIA It indicates you're grouping on the index. – cs95 May 30 '20 at 23:42
In my date column I got the list too of same size and same value but list is not wanted in date column – EXODIA May 30 '20 at 23:45
@EXODIA Can you drop the 'date' column, then add it back later once you're done sorting A and B? – cs95 May 30 '20 at 23:52

score 1 · Answer 2 · answered May 30 '20 at 23:35

1

Similar to cs95's solution, but with sort_values and sort_index:

# sample data
df = pd.DataFrame({
    'A':[[1,3,2],[2,1]],
    'B':[['a','b','c'],['c','d']],
    'Date':['date1','date2']
})

df[['A','B']] = (df[['A','B']].apply(pd.Series.explode)
    .sort_values('A',kind='mergesort')
    .sort_index(kind='merge_sort')
    .groupby(level=0).agg(list)
)

Output:

           A          B   Date
0  [1, 2, 3]  [a, c, b]  date1
1     [1, 2]     [d, c]  date2

answered May 30 '20 at 23:35

Quang Hoang

146,074
10
56
74

I cant understand the ```level=0``` part – EXODIA May 30 '20 at 23:40
And since only A column is mentioned here so is it like it will sort every other column (if I have more column with lists having equal size) – EXODIA May 30 '20 at 23:41
In my date column I got the list too of same size and same value but list is not wanted in date column – EXODIA May 30 '20 at 23:45

Sorting columns having lists with reference to each other in pandas

2 Answers2