I simply want to reorder the rows of my pandas dataframe such that col1
matches the order of the external list elements in my_order
.
d = {'col1': ['A', 'B', 'C'], 'col2': [1,2,3]}
df = pd.DataFrame(data=d)
my_order = ['B', 'C', 'A']
This post sorting by a custom list in pandas does the order work sorting by a custom list in pandas and using it for my data produces
d = {'col1': ['A', 'B', 'C'], 'col2': [1,2,3]}
df = pd.DataFrame(data=d)
my_order = ['B', 'C', 'A']
df.col1 = df.col1.astype("category")
df.col1.cat.set_categories(my_order, inplace=True)
df.sort_values(["col1"])
However, this seems to be a wasteful amount of code relative to an R process which would simply be
df = data.frame(col1 = c('A','B','C'), col2 = c(1,2,3))
my_order = c('B', 'C', 'A')
df[match(my_order, df$col1),]
Ordering is expensive and the python version above takes 3 steps where R takes only 1 using the match function. Can python not rival R in this case?
If this were simply done once in my real world example I wouldn't care much. But, this is a process that will be iterated on millions of times on a web server application and so a truly minimal, inexpensive path is the best approach