I have a dataframe a pandas dataframe with the following columns:
df = pd.DataFrame([
['A2', 2],
['B1', 1],
['A1', 2],
['A2', 1],
['B1', 2],
['A1', 1]],
columns=['one','two'])
Which I am hoping to sort primarily by column 'two', then by column 'one'. For the secondary sort, I would like to use a custom sorting rule that will sort column 'one' by the alphabetic character [A-Z]
and then the trailing numeric number [0-100]
. So, the outcome of the sort would be:
one two
A1 1
B1 1
A2 1
A1 2
B1 2
A2 2
I have sorted a list of strings similar to column 'one' before using a sorting rule like so:
def custom_sort(value):
return (value[0], int(value[1:]))
my_list.sort(key=custom_sort)
If I try to apply this rule via a pandas sort, I run into a number of issues including:
- The pandas
DataFrame.sort_values()
function accepts a key for sorting like the sort() function, but the key function should be vectorized (per the pandas documentation). If I try to apply the sorting key to only column 'one', I get the error "TypeError: cannot convert the series to <class 'int'>" - When you use the pandas
DataFrame.sort_values()
method, it applies the sort key to all columns you pass in. This will not work since I want to sort first by the column 'two' using a native numerical sort.
How would I go about sorting the DataFrame as mentioned above?