-1

I have two data frames df1 and df2

df1 = pd.dataframe("TestCaseName" : ['B', 'D', 'A', 'E', 'C'])

  TestCaseName
0     B
1     D
2     A
3     E
4     C

and another data frame

df2 = pd.dataframe({"TestCaseName" : ['A', 'B', 'C', 'D', 'E'], "NameSpace" : ['T2'. 'T3', 'T6', 'T1', 'T8'])

   TestCaseName  NameSpace    

 0      A           T2
 1      B           T3
 2      C           T6
 3      D           T1
 4      E           T8

What i want is sort the test case name of df2 according to df1.

Here is what i have tried;

df2 = df2.set_index('TestCaseName')
df2 = df2.reindex(index=df1['TestCaseName'])
df2 = df2.reset_index()

Which is giving me error ValueError: cannot reindex from a duplicate axis

Desired Output:

  TestCaseName NameSpace
0     B           T3
1     D           T1 
2     A           T2
3     E           T8
4     C           T6

Can someone tell me what am i doing wrong or suggest any better idea?

Jforpython
  • 11
  • 3
  • Does this answer your question? [sorting by a custom list in pandas](https://stackoverflow.com/questions/23482668/sorting-by-a-custom-list-in-pandas) – Alex Jun 27 '21 at 15:50
  • Just merge? `df2.merge(df1['TestCaseName'],how='right')` – anky Jun 27 '21 at 15:51

3 Answers3

0

Just do the right merge the dataframes where left is df2 and right is df1:

>>> df2.merge(df1, how='right')

  TestCaseName NameSpace
0            B        T3
1            D        T1
2            A        T2
3            E        T8
4            C        T6

PS: It assumes that df1 contains all the records for TestCaseName that are in df2

ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
0

This error means that you are assigning to a column when the index has duplicate values. You can try a merger instead using something simple like:

df2.merge(df1['TestCaseName'], how='right')

R_Dax
  • 706
  • 3
  • 10
  • 25
0

Here’s one way:

df2.TestCaseName = pd.Categorical(df2.TestCaseName,
                                  categories=df1.TestCaseName.values,
                                  ordered=True)

df2 = df2.sort_values('TestCaseName')
Nk03
  • 14,699
  • 2
  • 8
  • 22