5

I have a dataframe I am attempting to plot. I would like the data points to appear in sorted order along the x-axis in my plot. I have tried sorting the dataframe prior to passing it to ggplot, however my order gets disregarded. My data is as follows, I want to sort on the 'value' attribute.

       var1     var2  value     direction
0      PM25     PBAR  0.012001          1
1      PM25  DELTA_T  0.091262          1
2      PM25       RH  0.105857          1
3      PM25      WDV  0.119452          0
4      PM25     T10M  0.119506          0
5      PM25      T2M  0.129869          0
6      PM25     SRAD  0.134718          0
7      PM25      WSA  0.169000          0
8      PM25      WSM  0.174202          0
9      PM25      WSV  0.181596          0
10     PM25      SGT  0.263590          1

This is what my code looks like currently:

tix = np.linspace(0,.3,10)
corr = corr.sort_values(by='value').reset_index(drop = True)
p = ggplot(data = corr, mapping = aes(x='var2', y='value')) +\
  geom_point(mapping = aes(fill = 'direction')) + ylab('Correlation') + ggtitle('Correlation to PM25') +\
  theme_classic() +  scale_y_continuous(breaks = tix, limits = [0, .3])

print(p)

This produces the following plot:

1

Ehsan
  • 12,072
  • 2
  • 20
  • 33
Adam Conrad
  • 166
  • 1
  • 11
  • By default, `ggplot2` will treat any textual element along the x-axis as a factor variable and will set the levels in alphabetical order. To get the behavior you want, you need to make your `Variable` value to be a factor in the order provided. And then try to recreate the plot. – statstew Jun 22 '20 at 04:57
  • But as I hit submit, I didn't realize this was a Python implementation. The above comment is what you would need to do in R. It may or may not still be applicable. – statstew Jun 22 '20 at 05:00

2 Answers2

10

You can do it in two ways

  1. Make sure the variable mapped to the x-axis is a categorical and the categories are ordered correctly. Below I use the fact that pd.unique returns values in order of appearance.
corr.sort_values(by='value').reset_index(drop = True)
corr['var2'] = pd.Categorical(corr.var2, categories=pd.unique(corr.var2))
...
  1. Plotnine has an internal function reorder (introduced in v0.7.0) which you can use inside an aes() call to change the order of values of one variable based on the values of another variable. See the documentation for reorder at the bottom of the page.
# no need to sort values
p = ggplot(data = corr, mapping = aes(x='reorder(var2, value)', y='value')) +\
...
has2k1
  • 2,095
  • 18
  • 16
0

I couldn't get reorder() to work but I was able to use scale_x_discrete() to control the order. See https://stackoverflow.com/a/63578556/7685

Chris Nelson
  • 3,519
  • 7
  • 40
  • 51