Shuffle rows by a column in pandas

Question

I have the following example of dataframe.

    c1     c2
0   1       a
1   2       b
2   3       c
3   4       d
4   5       e

Given a template c1 = [3, 2, 5, 4, 1], I want to change the order of the rows based on the new order of column c1, so it will look like:

    c1     c2
0    3      c
1    2      b
2    5      e
3    4      d
4    1      a

I found the following thread, but the shuffle is random. Cmmiw.

Shuffle DataFrame rows

jezrael · Accepted Answer · 2018-09-10T13:53:06.590

5

If values are unique in list and also in c1 column use reindex:

df = df.set_index('c1').reindex(c1).reset_index()
print (df)
   c1 c2
0   3  c
1   2  b
2   5  e
3   4  d
4   1  a

General solution working with duplicates in list and also in column:

c1 = [3, 2, 5, 4, 1, 3, 2, 3]

#create df from list 
list_df = pd.DataFrame({'c1':c1})
print (list_df)
   c1
0   3
1   2
2   5
3   4
4   1
5   3
6   2
7   3

#helper column for count duplicates values
df['g'] = df.groupby('c1').cumcount()
list_df['g'] = list_df.groupby('c1').cumcount()

#merge together, create index from column and remove g column
df = list_df.merge(df).drop('g', axis=1)
print (df)
   c1 c2
0   3  c
1   2  b
2   5  e
3   4  d
4   1  a
5   3  c

edited Sep 10 '18 at 13:53

answered Sep 10 '18 at 13:28

jezrael

822,522
95
1,334
1,252

the list with unique value is my case. perfect solution. thanks! However I still don't understand the second case. why does the result have 5th index? @jezrael – ipramusinto Sep 10 '18 at 14:20
@bakka - second solution working if there are multiple same values like `3` and `2` in list, then `reindex` raise error. So is necessary count each value by `cumcount` and then merge - first dupe of list with first dupe of column. – jezrael Sep 10 '18 at 14:21

piRSquared · Answer 2 · 2018-09-10T14:10:08.457

`merge`

You can create a dataframe with the column specified in the wanted order then merge.
One advantage of this approach is that it gracefully handles duplicates in either df.c1 or the list c1. If duplicates not wanted then care must be taken to handle them prior to reordering.

d1 = pd.DataFrame({'c1': c1})

d1.merge(df)

   c1 c2
0   3  c
1   2  b
2   5  e
3   4  d
4   1  a

`searchsorted`

This is less robust but will work if df.c1 is:

already sorted
one-to-one mapping

df.iloc[df.c1.searchsorted(c1)]

   c1 c2
2   3  c
1   2  b
4   5  e
3   4  d
0   1  a

Shuffle rows by a column in pandas

2 Answers2

`merge`

`searchsorted`

Linked

Related