0

Sample of the dataset.

nq
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305

I would like to make a Percentage Bar plot/Histogram like this question: RE: Alignment of numbers on the individual bars with ggplot2

The max value of NQ for full dataset is 21 and minimum value is 0.00005

But I am unable to adapt the code as I don't have a Freq column and I have one series.

I have made a mockup of the figure I am trying to make.enter image description here

Could you please help?

Community
  • 1
  • 1
Geekuna Matata
  • 1,349
  • 5
  • 19
  • 38

2 Answers2

0

Would that work for you?

nq <- read.table(text = "
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305", header = F) # Your data

nq$V2 <- cut(nq$V1, 5, include.lowest = T)
nq2 <- aggregate(V1 ~ V2, nq, length)
nq2$V3 <- nq2$V1/sum(nq2$V1)
library(ggplot2)
ggplot() + geom_bar(data = nq2, aes(V2, V1), stat = "identity", width=1, fill = "white", col = "black", size = 2) + 
  geom_text(vjust=1, fontface="bold", data = nq2, aes(label = paste(sprintf("%.1f", V3*100), "%", sep=""), x = V2,  y = V1 + 0.4), size = 5) + 
  theme_bw() +
  scale_x_discrete(expand = c(0,0), labels = sprintf("%.3f",seq(min(nq$V1), max(nq$V1), by = max(nq$V1)/6))) +
  ylab("No. of Cases") + xlab("") +
  scale_y_continuous(expand = c(0,0)) +
  theme(
    axis.title.y = element_text(size = 20, face = "bold", angle = 0), 
    panel.grid.major = element_blank() ,
    panel.grid.minor = element_blank() ,
    panel.border = element_blank() ,
    panel.background = element_blank(),
    axis.line = element_line(color = 'black', size = 2),
    axis.text.x = element_text(face="bold"),
    axis.text.y = element_text(face="bold")
    ) 

enter image description here

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
0

I thought this would be easy, but it turned out to be frustrating. So perhaps the "right" way is to transform your data before using ggplot as it looks like @DavidArenburg has done. But, if you feel like hacking ggplot, here's what I ended up doing.

First, some sample data.

set.seed(15)
dd<-data.frame(x=sample(1:25, 100, replace=T, prob=25:1))
br <- seq(0,25, by=5) # break points

My first attempt was

library(ggplot2)
ggplot(dd, aes(x)) + 
    stat_bin(position="stack", breaks=br) + 
    geom_text(aes(y=..count.., label=..density..*..width.., ymax=..count..+1), 
        vjust=-.5, breaks=br, stat="bin")

but that didn't make "pretty labels"

enter image description here

so i thought i'd use the percent() function from the scales package to make it pretty. However, silly ggplot doesn't really make it possible to use functions with ..().. variables because it evaluates them in the data.frame only (then the empty baseenv()). It doesn't have a way to find the function you use. So this is when I turned to hacking. First i'll extract the "Layer" definition from ggplot and the map_statistic from it. (NOTE: this was done with "ggplot2_1.0.0" and is specific to that version; this is a private function that may change in future releases)

orig.map_statistic <- ggplot2:::Layer$map_statistic
new.map_statistic <- orig.map_statistic
body(new.map_statistic)[[9]]
# stat_data <- as.data.frame(lapply(new, eval, data, baseenv()))

here's the line that's causing grief I would prefer it the function resolved other names in the plot environment that are not found in the data.frame. So I decided to change it with

body(new.map_statistic)[[9]] <- quote(stat_data <- as.data.frame(lapply(new, eval, data, plot$plot_env)))
assign("map_statistic", new.map_statistic, envir=ggplot2:::Layer)

So now I can use functions with ..().. variables. So I can do

library(scales)
ggplot(dd, aes(x)) + 
    stat_bin(position="stack", breaks=br) + 
    geom_text(aes(y=..count.., ymax=..count..+2, 
        label=percent(..density..*..width..)), 
        vjust=-.5, breaks=br, stat="bin")

to get

enter image description here

So i'm not sure why ggplot has this default behavior. There could be some good reason for it but I don't know what it is. This does change how ggplot will behave for the rest of the session. You can change back to default with

assign("map_statistic", orig.map_statistic, envir=ggplot2:::Layer)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Perhaps @hadley might be able to explain why it only searches the calculated data.frame and doesn't continue searching in the plot environment. – MrFlick Jul 03 '14 at 07:22