I had a similar doubt as this category ordering issue. In this kernel Allstate EDA Skip to the part where he makes count plots of the categorical features, see that the order of A
and B
changes for cat2
and cat11
and many others. In this case however, we do not know how many unique categories are there, as for the later rows there are a lot more. Is there a easy way to fix it to have same order without writing a complex loop to run over a dictionary of the possible categories?
EDIT: Since the dataset is huge, I have no idea how I can make it reproducible here. But to make it a bit clear, the code uses a loop
for i in range(n_rows):
fg,ax = plt.subplots(nrows=1,ncols=n_cols,sharey=True,figsize=(12, 8))
for j in range(n_cols):
sns.countplot(x=cols[i*n_cols+j], data=dataset, ax=ax[j])
where n_cols = 4
and n_rows = 29
.
The problem is that, as far as I know, we give a list or series to order
. eg. order = ['A', 'B']
. But in this dataset, for some columns there are only 2 categories A and B, but for others there are a lot of categories and different number of categories for each column.
(I feel like this is going nowhere)