Often times I have to convert even continuous data into a categorical datatype, since it helps my statistical analysis.
When I apply boolean indexing (values < 11) to categorical columns, they are not sliced as expected:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
### MAKE TESTDATA
df = sns.load_dataset("fmri")
df["timepoint"] = pd.Categorical(df["timepoint"], ordered=True)
### PERFORM BOOLEAN SLICING
df = df.loc[df["timepoint"] < 11]
# df = df.where(df["timepoint"] < 11) # SAME RESULT
g = sns.catplot(data=df, y="signal", x="timepoint")
This yields incorrect plots. The x-axis still goes over 11, while the datapoints were correctly sliced away:
Cause:
The categorical data was sliced, BUT its index ("categories") ignored the slicing operation. Pandas seems to use the index to display the x-axis.
>>> print(df.timepoint.cat.categories)
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], dtype='int64')
What would make it work:
Performing the slicing BEFORE converting to categorical leads to the desired behavior. So does converting the categorical type back to numerical and then again to categorical. HOWEVER. I doubt that this is they way it is intended.