0

I was doing some box plots with three variables in ggplot and struggled for a long time to get what i wanted because the data in my x axis was numeric and not factor. After reading documentation (http://www.sthda.com/english/wiki/ggplot2-box-plot-quick-start-guide-r-software-and-data-visualization) and other questions here (Modify x-axis labels in each facet and ggplot: arranging boxplots of multiple y-variables for each group of a continuous x) i understood that i had to transform my x axis data to factors.

> str(HT_2)
Classes ‘data.table’ and 'data.frame':  540 obs. of  4 variables:
 $ T             : int  -1 -2 -3 -4 0 1 2 3 4 -1 ...
 $ Month         : Factor w/ 12 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Gauge         : chr  "Aconcagua" "Aconcagua" "Aconcagua" "Aconcagua" ...
 $ Norm_Flow_m3_s: num  1.49 1.77 1.99 2.02 1.17 ...
 - attr(*, ".internal.selfref")=<externalptr> 

The following line of code solved my problem

ggplot(HT_2, aes(x=Month, y=Norm_Flow_m3_s,fill=Gauge)) + geom_boxplot()

But the same line fails when the x axis has int data

> str(HT_2)
Classes ‘data.table’ and 'data.frame':  540 obs. of  4 variables:
 $ T             : int  -1 -2 -3 -4 0 1 2 3 4 -1 ...
 $ Month         : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Gauge         : chr  "Aconcagua" "Aconcagua" "Aconcagua" "Aconcagua" ...
 $ Norm_Flow_m3_s: num  1.49 1.77 1.99 2.02 1.17 ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr "Month"

Even if i try to group my x axis data

ggplot(HT_2, aes(x=Month, y=Norm_Flow_m3_s,fill=Gauge,group=Month)) + geom_boxplot()

Does anyone know why this happens? Or how could i include numeric data in a box plot?

Juan Ossa
  • 1,153
  • 1
  • 10
  • 14
  • 2
    `ggplot2` relies on factors to perform a lot of the underlying computation. It's just how it was built. If you don't want to convert a variable to a factor in your dataframe, you can use the `factor` function inside `aes` eg: `ggplot(HT_2, aes(x=factor(Month), y=Norm_Flow_m3_s,fill=Gauge)) + geom_boxplot()`. Factors are good way to represent categorical data (such as months). Why do you want numeric data in your plots? – Amar May 15 '18 at 03:22
  • Thanks for the tip of using the factor function inside `aes` i had not thought about it. I know factors are useful when dealing with categorical variables, but i wanted to understand the reasons behind this requirement. – Juan Ossa May 15 '18 at 03:41
  • If you are intrigued about the reasoning behind the `tidyrverse` API, you can always email or ping the author. But imo unless it's pivotal to your work: ggplot2 uses factors. – Amar May 15 '18 at 04:48

0 Answers0