0

I have 2 dataframes that both contain the same columns but are ordered differently. The first dataframe is ordered this way:

   Code  Description  Value
0  01        A         .3
1  01        B         .5
2  01        C        1.2
3  01        D         .4
4  01        E         .6
5  01        F         .9

The second dataframe is ordered this way:

   Code  Description  Value
0  12        A         .5
1  13        A         .3
2  14        A         .7
3  15        A         2.4
4  16        A         4.6
5  17        A         .6

Basically, they both contain all of the same type of content but are just ordered differently. Dataframe 1 lists all the descriptions in a particular order for a code, then moves on to the next code and repeats. Dataframe 2 lists all the codes for a description, then moves on to the next description and repeats. My question is how can I get dataframe 2 to match the same ordering pattern as dataframe 1?

What I want dataframe 2 to become:

   Code  Description  Value
0  12        A         .5
1  12        B         .6
2  12        C         .1
3  12        D         2.8
4  12        E         .4
5  12        F         .2

I was thinking that some sort of sort function might be useful here, but the main challenge is that I have to sort by description (which is a str), in a particular order that I define (in this case it is just A, B, C but can be any order) while also grouping by code.

Jonathan Chen
  • 303
  • 2
  • 10

2 Answers2

1

If you set the two columns to be sorted in this order:

df.sort_values(by=['Code','Description'])

helps?

emdemor
  • 21
  • 3
0

After getting some helpful feedback I realized what I really needed was how to sort by custom list, so I found sorting by a custom list in pandas . I tried the selected answer which did not work for me but the second answer did. My implementation is below.

sorter=['A','B','C'] #whatever order I want Description to be sorted in 
df.Description = df.Description.astype("category")
df.Description.cat.set_categories(sorter, inplace=True)    
df.sort_values(by=['Code','Description'])
Jonathan Chen
  • 303
  • 2
  • 10