1

My data frame is simple (and probably is not strictly a dataframe):

   date MAE_f0 MAE_f1
   1  20140101           0.2           0.2
   2  20140102           1.9           0.1
   3  20140103           0.1           0.3
   4  20140104           7.8          15.9
   5  20140105           1.9           4.6
   6  20140106           0.8           0.8
   7  20140107           0.5           0.6
   8  20140108           0.2           0.2
   9  20140109           0.2           0.2
   10 20140110           0.8           1.1
   11 20140111           0.2           0.2
   12 20140112           0.4           0.4
   13 20140113           2.8           0.9
   14 20140114           5.4           5.8
   15 20140115           0.2           0.3
   16 20140116           4.9           3.1
   17 20140117           3.7           6.0
   18 20140118           1.4           2.1
   19 20140119           0.9           3.0
   20 20140120           0.2           3.6
   21 20140121           0.3           0.3
   22 20140122           0.4           0.4
   23 20140123           0.6           1.7
   24 20140124           6.1           4.7
   25 20140125           0.1           0.0
   26 20140126           7.4           4.9
   27 20140127           0.8           0.9
   28 20140128           0.3           0.3
   29 20140129           3.0           4.2
   30 20140130           9.9          17.3

On every day I've 2 variables: MAE for f0, and MAE for f1.

I can calculate frequency for my 2 variables on the whole time period using "cut" with the same intervals for both:

cut(mae.df$MAE_f0,c(0,2,5,10,50))

cut(mae.df$MAE_f1,c(0,2,5,10,50))

Well. Now I can use boxplot to plot variable versus it's frequency distribution:

boxplot(mae.df$MAE_f0~cut(mae.df$MAE_f0,c(0,2,5,10,50)))

boxplot(mae.df$MAE_f1~cut(mae.df$MAE_f1,c(0,2,5,10,50)))

The produced boxplot (2) are very simple (but I don't show it 'cause I've ho "reputation"): on x there are the intervals of frequency (0-2,2-5,5-10,10-50), on y the boxplot value for variable MAE_f0 for each interval.

Well, the question is very trivial: I'd like to have only one box plot, with both variables MAE_f0 and MAE_f1 and it's frequency distribution: I'd like to have is a plot with 2 boxplot for each frequency interval (I mean: 2 for 0-2, 2 for 2-5 and so on).

I know that my knowledge on R, data frame and so on is very poor, and, de facto, I'm missing something important about those arguments, specially on data frame and reshaping! Sorry in advance for that!But I've seen some nice examples in stackoverflow about grouping boxplot, all without time variable, and I'm not able to figure out how I can adjust my data frame for doing that.

I hope my question is not misplaced: sorry again for that.

Umbe

jazzurro
  • 23,179
  • 35
  • 66
  • 76
Umbe
  • 11
  • 4
  • I think [**this post**](http://stackoverflow.com/questions/14604439/plot-multiple-boxplot-in-one-graph/14605817#14605817) should get you started. – Henrik Jan 05 '15 at 15:19

2 Answers2

0

Here is how I would do this. I think it makes sense to melt your data first. A quick tutorial on melting your data is available here.

# First, make this reproducible by using dput for the data frame
df <- structure(list(date = 20140101:20140130, MAE_f0 = c(0.2, 1.9, 0.1, 7.8, 1.9, 0.8, 0.5, 0.2, 0.2, 0.8, 0.2, 0.4, 2.8, 5.4, 0.2, 4.9, 3.7, 1.4, 0.9, 0.2, 0.3, 0.4, 0.6, 6.1, 0.1, 7.4, 0.8, 0.3, 3, 9.9), MAE_f1 = c(0.2, 0.1, 0.3, 15.9, 4.6, 0.8, 0.6, 0.2, 0.2, 1.1, 0.2, 0.4, 0.9, 5.8, 0.3, 3.1, 6, 2.1, 3, 3.6, 0.3, 0.4, 1.7, 4.7, 0, 4.9, 0.9, 0.3, 4.2, 17.3)), .Names = c("date", "MAE_f0", "MAE_f1"), row.names = c(NA, -30L), class = "data.frame")

require(ggplot2)
require(reshape2)

# Melt the original data frame
df2 <- melt(df, measure.vars = c("MAE_f0", "MAE_f1"))
head(df2)
#       date variable value
# 1 20140101   MAE_f0   0.2
# 2 20140102   MAE_f0   1.9
# 3 20140103   MAE_f0   0.1
# 4 20140104   MAE_f0   7.8
# 5 20140105   MAE_f0   1.9
# 6 20140106   MAE_f0   0.8

# Create a "cuts" variable with the correct breaks
df2$cuts <- cut(df2$value, 
                breaks = c(-Inf, 2, 5, 10, +Inf), 
                labels = c("first cut", "second cut", "third cut", "fourth cut"))
head(df2)
#       date variable value      cuts
# 1 20140101   MAE_f0   0.2 first cut
# 2 20140102   MAE_f0   1.9 first cut
# 3 20140103   MAE_f0   0.1 first cut
# 4 20140104   MAE_f0   7.8 third cut
# 5 20140105   MAE_f0   1.9 first cut
# 6 20140106   MAE_f0   0.8 first cut

# Plotting
ggplot(df2, aes(x = variable, y = value, fill = variable)) +
  geom_boxplot() +
  facet_wrap(~ cuts, nrow = 1)

Result:

Resulting Graph

JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
0

Here is one way. You reshape your data. Then, you want to add a fake data point in this case. I noticed that there is no data point for MAE_f0 for (10,50](frequency 10-50). Combine your reshaped data and the fake data. When you draw a figure, use coord_cartesian with the range of y values in the original data set. Hope this gives you an ideal graphic. Here, your data is called mydf

library(dplyr)
library(tidyr)
library(ggplot2)


mydf <- structure(list(V1 = 1:30, V2 = 20140101:20140130, V3 = c(0.2, 
1.9, 0.1, 7.8, 1.9, 0.8, 0.5, 0.2, 0.2, 0.8, 0.2, 0.4, 2.8, 5.4, 
0.2, 4.9, 3.7, 1.4, 0.9, 0.2, 0.3, 0.4, 0.6, 6.1, 0.1, 7.4, 0.8, 
0.3, 3, 9.9), V4 = c(0.2, 0.1, 0.3, 15.9, 4.6, 0.8, 0.6, 0.2, 
0.2, 1.1, 0.2, 0.4, 0.9, 5.8, 0.3, 3.1, 6, 2.1, 3, 3.6, 0.3, 
0.4, 1.7, 4.7, 0, 4.9, 0.9, 0.3, 4.2, 17.3)), .Names = c("V1", 
"V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -30L
))

ana <- select(mydf, -V1) %>%
       rename(date = V2, MAE_f0 = V3, MAE_f1 = V4) %>%
       gather(variable, value, -date) %>%
       mutate(frequency = cut(value, breaks = c(-Inf,2,5,10,50)))

# Create a fake df
extra <- data.frame(date = 20140101,
                    variable = "MAE_f0",
                    value = 60,
                    frequency = "(10,50]")

new <- rbind(ana, extra)


ggplot(data = new, aes(x = frequency, y = value, fill = variable)) +
geom_boxplot(position = "dodge") +
coord_cartesian(ylim = range(ana$value) + c(-0.25, 0.25))

enter image description here

jazzurro
  • 23,179
  • 35
  • 66
  • 76