0

Though the question was asked previously here, i have a new data frame hence a new question. The data sample is shown below:

ID,Region,Dimension,BlogsInd.,BlogsNews,BlogsTech,Columns
1,PK,Dim1,-4.75,NA,NA,NA
2,PK,Dim1,-5.69,NA,NA,NA
3,PK,Dim1,-0.27,NA,NA,NA
4,PK,Dim1,-2.76,NA,NA,NA
5,PK,Dim1,-8.24,NA,NA,NA
6,PK,Dim1,-12.51,NA,NA,NA
7,PK,Dim1,-1.28,NA,NA,NA
8,PK,Dim1,0.95,NA,NA,NA
9,PK,Dim1,-5.96,NA,NA,NA
10,PK,Dim1,-8.81,NA,NA,NA
11,PK,Dim1,-8.46,NA,NA,NA
12,PK,Dim1,-6.15,NA,NA,NA
13,PK,Dim1,-13.98,NA,NA,NA
14,PK,Dim1,-16.43,NA,NA,NA
15,PK,Dim1,-4.09,NA,NA,NA
16,PK,Dim1,-11.06,NA,NA,NA
17,PK,Dim1,-9.04,NA,NA,NA
18,PK,Dim1,-8.56,NA,NA,NA
19,PK,Dim1,-8.13,NA,NA,NA
20,PK,Dim2,-14.46,NA,NA,NA
21,PK,Dim2,-4.21,NA,NA,NA
22,PK,Dim2,-4.96,NA,NA,NA
23,PK,Dim2,-5.48,NA,NA,NA
24,PK,Dim2,-4.53,NA,NA,NA
25,PK,Dim2,6.31,NA,NA,NA
26,PK,Dim2,-11.16,NA,NA,NA
27,PK,Dim2,-1.27,NA,NA,NA
28,PK,Dim2,-11.49,NA,NA,NA
29,PK,Dim2,-0.9,NA,NA,NA
30,PK,Dim2,-12.27,NA,NA,NA
31,PK,Dim2,6.85,NA,NA,NA
32,PK,Dim2,-5.21,NA,NA,NA
33,PK,Dim2,-1.06,NA,NA,NA
34,PK,Dim2,-2.6,NA,NA,NA
35,PK,Dim2,-0.95,NA,NA,NA
36,PK,Dim3,-0.82,NA,NA,NA
37,PK,Dim3,-7.65,NA,NA,NA
38,PK,Dim3,0.64,NA,NA,NA
39,PK,Dim3,-2.25,NA,NA,NA
40,PK,Dim3,-1.58,NA,NA,NA
41,PK,Dim3,-5.73,NA,NA,NA
42,PK,Dim3,0.37,NA,NA,NA
43,PK,Dim3,-5.46,NA,NA,NA
44,PK,Dim3,-3.48,NA,NA,NA
45,PK,Dim3,0.88,NA,NA,NA
46,PK,Dim3,-2.11,NA,NA,NA
47,PK,Dim3,-10.13,NA,NA,NA
48,PK,Dim3,-2.08,NA,NA,NA
49,PK,Dim3,-4.33,NA,NA,NA
50,PK,Dim3,1.09,NA,NA,NA
51,PK,Dim3,-4.23,NA,NA,NA
52,PK,Dim3,-1.46,NA,NA,NA
53,PK,Dim3,9.37,NA,NA,NA
54,PK,Dim3,5.84,NA,NA,NA
55,PK,Dim3,8.21,NA,NA,NA
56,PK,Dim3,7.34,NA,NA,NA
57,PK,Dim4,1.83,NA,NA,NA
58,PK,Dim4,14.39,NA,NA,NA
59,PK,Dim4,22.02,NA,NA,NA
60,PK,Dim4,4.83,NA,NA,NA
61,PK,Dim4,-3.24,NA,NA,NA
62,PK,Dim4,-5.69,NA,NA,NA
63,PK,Dim4,-22.92,NA,NA,NA
64,PK,Dim4,0.41,NA,NA,NA
65,PK,Dim4,-4.42,NA,NA,NA
66,PK,Dim4,-10.72,NA,NA,NA
67,PK,Dim4,-11.29,NA,NA,NA
68,PK,Dim4,-2.89,NA,NA,NA
69,PK,Dim4,-7.59,NA,NA,NA
70,PK,Dim4,-7.45,NA,NA,NA
71,US,Dim1,-12.49,NA,NA,NA
72,US,Dim1,-11.59,NA,NA,NA
73,US,Dim1,-4.6,NA,NA,NA
74,US,Dim1,-22.83,NA,NA,NA
75,US,Dim1,-4.83,NA,NA,NA
76,US,Dim1,-14.76,NA,NA,NA
77,US,Dim1,-15.93,NA,NA,NA
78,US,Dim1,-2.78,NA,NA,NA
79,US,Dim1,-16.39,NA,NA,NA
80,US,Dim1,-15.22,NA,NA,NA
81,US,Dim1,3.25,NA,NA,NA
82,US,Dim1,-2.73,NA,NA,NA
83,US,Dim1,0.96,NA,NA,NA
84,US,Dim1,-1.12,NA,NA,NA
85,US,Dim1,-0.33,NA,NA,NA
86,US,Dim1,-6.45,NA,NA,NA
87,US,Dim1,2.52,NA,NA,NA
88,US,Dim1,3.18,NA,NA,NA
89,US,Dim1,4.65,NA,NA,NA
90,US,Dim2,-1.75,NA,NA,NA
91,US,Dim2,-0.22,NA,NA,NA
92,US,Dim2,8.16,NA,NA,NA
93,US,Dim2,1.89,NA,NA,NA
94,US,Dim2,4.31,NA,NA,NA
95,US,Dim2,-0.41,NA,NA,NA
96,US,Dim2,-23.02,NA,NA,NA
97,US,Dim2,3.87,NA,NA,NA
98,US,Dim2,-4.76,NA,NA,NA
99,US,Dim2,4.95,NA,NA,NA
100,US,Dim2,4.78,NA,NA,NA
101,US,Dim2,-15.11,NA,NA,NA
102,US,Dim2,-3.74,NA,NA,NA
103,US,Dim2,-6.15,NA,NA,NA
104,US,Dim2,-8.33,NA,NA,NA
105,US,Dim2,-5.55,NA,NA,NA
106,US,Dim3,-5.1,NA,NA,NA
107,US,Dim3,-0.41,NA,NA,NA
108,US,Dim3,-8,NA,NA,NA
109,US,Dim3,-11.8,NA,NA,NA
110,US,Dim3,-10.39,NA,NA,NA
111,US,Dim3,-14.98,NA,NA,NA
112,US,Dim3,-13.14,NA,NA,NA
113,US,Dim3,-16.06,NA,NA,NA
114,US,Dim3,-16.75,NA,NA,NA
115,US,Dim3,-17.58,NA,NA,NA
116,US,Dim3,-13.12,NA,NA,NA
117,US,Dim3,-15.69,NA,NA,NA
118,US,Dim3,-9.29,NA,NA,NA
119,US,Dim3,-14.93,NA,NA,NA
120,US,Dim3,-18.75,NA,NA,NA
121,US,Dim3,-16.15,NA,NA,NA
122,US,Dim3,-14.38,NA,NA,NA
123,US,Dim3,-11.33,NA,NA,NA
124,US,Dim3,2.06,NA,NA,NA
125,US,Dim3,1.55,NA,NA,NA
126,US,Dim3,3.17,NA,NA,NA
127,US,Dim4,3.33,NA,NA,NA
128,US,Dim4,-3.31,NA,NA,NA
129,US,Dim4,5.67,NA,NA,NA
130,US,Dim4,-1.94,NA,NA,NA
131,US,Dim4,-4.2,NA,NA,NA
132,US,Dim4,-13.53,NA,NA,NA
133,US,Dim4,-10.84,NA,NA,NA
134,US,Dim4,-1.04,NA,NA,NA
135,US,Dim4,-8.02,NA,NA,NA
136,US,Dim4,-14.65,NA,NA,NA
137,US,Dim4,-6.39,NA,NA,NA
138,US,Dim4,-3.69,NA,NA,NA
139,US,Dim4,-11.62,NA,NA,NA
140,US,Dim4,-3.02,NA,NA,NA
141,US,Dim4,-28.84,NA,NA,NA

I am trying to create a grouped box plot (uisng a function) with mean values shown in the box plots for each group. The code is below:

attach(data_Blogs)    
plotgraph <- function(x, y, colour, min, max){

      plot1 <- ggplot(dims_Blog, aes_string(x = x, y = y, fill = colour)) +
        geom_boxplot()+
        labs(color=colour) +
        #scale_y_continuous(breaks=c(seq(min,max,5)), limits = c(min, max))+
        labs(x="Dimensions", y="Dimension Score") +
        scale_fill_grey(start = 0.3, end = 0.7) + 
        theme_grey()+
        theme(legend.justification = c(1, 1), legend.position = c(1, 1))+
        geom_text(data= melt(with(dims_Blog, tapply(eval(parse(text=y)),list(eval(parse(text=x)),eval(parse(text=colour))), mean)),varnames=c("Dimension","Region"),value.name="med"),
                  aes_string(y = "med",x=x, label = "round(med,3)"),position=position_dodge(width = 0.8),size = 3, vjust = -0.5,colour="white")
      return(plot1)
    }
    plot1 <- plotgraph ("Dimension", "BlogsInd.", "Region")

I am having problem to understand the part starting with "geom_text" where the data is passed on for mean value. The data frame is being melted (long to wide format) which I think is not required in this scenario as the data is already in wide format. I tried to use 'stats_summary' function with no success. Your help will be great in helping me find the solution.

Community
  • 1
  • 1
Shakir
  • 343
  • 5
  • 23

1 Answers1

0

Indeed melting the data seems superfluous. Rather, you should summarise the data, for instance with dplyr:

library(dplyr)
ggplot(dims_Blog, aes(x=Dimension, y=BlogsInd., fill=Region)) +
  geom_boxplot() +
  geom_text(data = dims_Blog %>% group_by(Dimension, Region) %>% summarise(mean = mean(BlogsInd.)), 
            aes(x = Dimension, y = mean, label = round(mean, 2)), 
            position = position_dodge(width = .7))

And then fine-tune your positioning / formatting.

edit: I did not click through to your previous question, which already extends the above example to prevent NSE in a programming context. So use group_by_ and aes_string in your function.

Taeke
  • 179
  • 5
  • Thanks for the answer. There is the problem of NAs. The structure of data frame is such that at one time only one column has numerical values while others have NA. That is why the whole set of rows is not numerical. Tried this `summarise(mean = mean(y = y %>% na.omit())` with no luck. – Shakir Feb 12 '17 at 19:26
  • I don't fully understand the problem you're referring to - there are no `NA`'s in the BlogsInd. column of your data sample. To exclude NA's from the mean, you can use `mean(y, na.rm = TRUE)`. Idem, if you would like to select the (only?) numerical value in some columns, you can use `max(col1, col2, etc, na.rm = TRUE)`. – Taeke Feb 12 '17 at 20:06
  • Sorry. There was issue with passing non-numerical values to round(). Now it simply does not recognize the function. Looks like there is a problem with R Runtime, I will report back after solving the issue. – Shakir Feb 12 '17 at 20:26
  • `plotgraph <- function(x, y, colour) { plot1 <- ggplot(dims_Blog, aes_string(x = x, y = y, fill = colour)) + geom_boxplot()+ labs(color=colour) + labs(x="Dimensions", y="Dimension Score") + scale_fill_grey(start = 0.3, end = 0.7) + theme_grey()+ theme(legend.justification = c(1, 1), legend.position = c(1, 1)) + geom_text(data = dims_Blog %>% group_by_(x, colour) %>% summarise(mean=mean(y)), aes_string(x=x, y="mean", label="round(mean,3)"), position=position_dodge(width=0.8), size = 3, vjust = -0.5, colour="white") return(plot1) }` – Shakir Feb 12 '17 at 20:56
  • `plot1 <- plotgraph("Dimension", "BlogsInd.", "Region")` And it returns blank. If `summarise_` is used "argument is not numeric in mean.default" – Shakir Feb 12 '17 at 20:58