I have a dataset containing a bunch of data in the columns params
and value
. I'd like to count how many values each params
contains (to use as labels in a boxplot), so I use mydf['params'].value_counts()
to show this:
slidingwindow_250 11574
hotspots_1k_100 8454
slidingwindow_500 5793
slidingwindow_100 5366
hotspots_5k_500 3118
slidingwindow_1000 2898
hotspots_10k_1k 1772
slidingwindow_2500 1160
slidingwindow_5000 580
Name: params, dtype: int64
I have a list of all of the entries in params
in the order I wish to display them in a boxplot. I try to use sort_index(level=myorder)
to get them in my custom order, but the function ignores myorder
and just sorts them alphabetically.
myorder = ["slidingwindow_100",
"slidingwindow_250",
"slidingwindow_500",
"slidingwindow_1000",
"slidingwindow_2500",
"slidingwindow_5000",
"hotspots_1k_100",
"hotspots_5k_500",
"hotspots_10k_1k"]
sizes_bp_log_df['params'].value_counts().sort_index(level=myorder)
hotspots_10k_1k 1772
hotspots_1k_100 8454
hotspots_5k_500 3118
slidingwindow_100 5366
slidingwindow_1000 2898
slidingwindow_250 11574
slidingwindow_2500 1160
slidingwindow_500 5793
slidingwindow_5000 580
Name: params, dtype: int64
How can I get the index of my value counts in the order I want them to be in?
In addition, I'll be using the median of each distribution as coordinates for the boxplot labels too, which I retrieve using sizes_bp_log_df.groupby(['params']).median()
; hopefully your suggested sort methods will also work for that task.