2

I have the following example of dataframe.

    c1     c2
0   1       a
1   2       b
2   3       c
3   4       d
4   5       e

Given a template c1 = [3, 2, 5, 4, 1], I want to change the order of the rows based on the new order of column c1, so it will look like:

    c1     c2
0    3      c
1    2      b
2    5      e
3    4      d
4    1      a

I found the following thread, but the shuffle is random. Cmmiw.

Shuffle DataFrame rows

ipramusinto
  • 2,310
  • 2
  • 14
  • 24

2 Answers2

5

If values are unique in list and also in c1 column use reindex:

df = df.set_index('c1').reindex(c1).reset_index()
print (df)
   c1 c2
0   3  c
1   2  b
2   5  e
3   4  d
4   1  a

General solution working with duplicates in list and also in column:

c1 = [3, 2, 5, 4, 1, 3, 2, 3]

#create df from list 
list_df = pd.DataFrame({'c1':c1})
print (list_df)
   c1
0   3
1   2
2   5
3   4
4   1
5   3
6   2
7   3

#helper column for count duplicates values
df['g'] = df.groupby('c1').cumcount()
list_df['g'] = list_df.groupby('c1').cumcount()

#merge together, create index from column and remove g column
df = list_df.merge(df).drop('g', axis=1)
print (df)
   c1 c2
0   3  c
1   2  b
2   5  e
3   4  d
4   1  a
5   3  c
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • the list with unique value is my case. perfect solution. thanks! However I still don't understand the second case. why does the result have 5th index? @jezrael – ipramusinto Sep 10 '18 at 14:20
  • @bakka - second solution working if there are multiple same values like `3` and `2` in list, then `reindex` raise error. So is necessary count each value by `cumcount` and then merge - first dupe of list with first dupe of column. – jezrael Sep 10 '18 at 14:21
3

merge

You can create a dataframe with the column specified in the wanted order then merge.
One advantage of this approach is that it gracefully handles duplicates in either df.c1 or the list c1. If duplicates not wanted then care must be taken to handle them prior to reordering.

d1 = pd.DataFrame({'c1': c1})

d1.merge(df)

   c1 c2
0   3  c
1   2  b
2   5  e
3   4  d
4   1  a

searchsorted

This is less robust but will work if df.c1 is:

  • already sorted
  • one-to-one mapping

df.iloc[df.c1.searchsorted(c1)]

   c1 c2
2   3  c
1   2  b
4   5  e
3   4  d
0   1  a
piRSquared
  • 285,575
  • 57
  • 475
  • 624