3

I am new to R and ggplot2.I have searched a lot regarding this but I could not find the solution.

Sample  observation1    observation2    observation3    percentage
sample1_A   163453473   131232689   61984186    30.6236955883
Sample1_B   170151351   137202212   59242536    26.8866816109
sample2_A   194102849   162112484   89158170    40.4183031852
sample2_B   170642240   141888123   79925652    41.7493687378
sample3_A   192858504   161227348   90532447    41.8068248626
sample3_B   177174787   147412720   81523935    40.5463120438
sample4_A   199232380   174656081   118115358   55.6409038531
sample4_B   211128931   186848929   123552556   54.7201927527
sample5_A   186039420   152618196   87012356    40.9656544833
sample5_B   145855252   118225865   66265976    39.5744515254
sample6_A   211165202   186625116   112710053   48.5457722338
sample6_B   220522502   193191927   114882014   47.238670909

I am planning to plot a bar plot with ggplot2. I want to plot the first three columns as a bar plot "dodge" and label the observation3 bar with the percentage. I could plot the bars as below but I could not use geom_text() to add the label.

data1 <- read.table("readStats.txt", header=T)
data1.long <- melt(data1)
ggplot(data1.long[1:36,], aes(data1.long$Sample[1:36],y=data1.long$value[1:36], fill=data1.long$variable[1:36])) + geom_bar(stat="identity", width=0.5, position="dodge")
gthm
  • 1,878
  • 4
  • 26
  • 37

2 Answers2

1

Transform data1 to long form with the observation columns as the measure variables and the Sample and percentage columns as the id variables. Compute the maximum value, mx, to be used to place the percentages. Then perform the plot. Note that geom_bar uses data1.long but geom_text uses data1. We have colored the text giving the percentages the same color as the observation3 bars. (See this post for how to specify default colors.) Both inherit aes(x = Sample) but use different y and other aesthetics. We clean up the X axis labels by removing all lower case letters and underscores from the data1$Sample (optional).

library(ggplot2)
library(reshape2)

data1.long <- melt(data1, measure = 2:4)  # cols 2:4 are observation1, ..., observation3
mx <- max(data1.long$value)  # maximum observation value
ggplot(data1.long, aes(x = Sample, y = value)) +
   geom_bar(aes(fill = variable), stat = "identity", width = 0.5, position = "dodge") + 
   geom_text(aes(y = mx, label = paste0(round(percentage), "%")), data = data1, 
        col = "#619CFF", vjust = -0.5) +
   scale_x_discrete(labels = gsub("[a-z_]", "", data1$Sample))

(click on chart to enlarge)

screenshot

Note: We used this data. Note that one occurrence of Sample was changed to sample with a lower case s:

Lines <- "Sample  observation1    observation2    observation3    percentage
sample1_A   163453473   131232689   61984186    30.6236955883
sample1_B   170151351   137202212   59242536    26.8866816109
sample2_A   194102849   162112484   89158170    40.4183031852
sample2_B   170642240   141888123   79925652    41.7493687378
sample3_A   192858504   161227348   90532447    41.8068248626
sample3_B   177174787   147412720   81523935    40.5463120438
sample4_A   199232380   174656081   118115358   55.6409038531
sample4_B   211128931   186848929   123552556   54.7201927527
sample5_A   186039420   152618196   87012356    40.9656544833
sample5_B   145855252   118225865   66265976    39.5744515254
sample6_A   211165202   186625116   112710053   48.5457722338
sample6_B   220522502   193191927   114882014   47.238670909"

data1 <- read.table(text = Lines, header = TRUE)

UPDATE: minor improvements

Community
  • 1
  • 1
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

It might be that G. Grothendieck's answer is a better solution, but here's my suggestion (code below)

enter image description here

# install.packages("ggplot2", dependencies = TRUE)
require(ggplot2)

df <- structure(list(Sample = structure(1:12, .Label = c("sample1_A", 
"Sample1_B", "sample2_A", "sample2_B", "sample3_A", "sample3_B", 
"sample4_A", "sample4_B", "sample5_A", "sample5_B", "sample6_A", 
"sample6_B"), class = "factor"), observation1 = c(163453473L, 
170151351L, 194102849L, 170642240L, 192858504L, 177174787L, 199232380L, 
211128931L, 186039420L, 145855252L, 211165202L, 220522502L), 
    observation2 = c(131232689L, 137202212L, 162112484L, 141888123L, 
    161227348L, 147412720L, 174656081L, 186848929L, 152618196L, 
    118225865L, 186625116L, 193191927L), observation3 = c(61984186L, 
    59242536L, 89158170L, 79925652L, 90532447L, 81523935L, 118115358L, 
    123552556L, 87012356L, 66265976L, 112710053L, 114882014L), 
    percentage = c(30.6236955883, 26.8866816109, 40.4183031852, 
    41.7493687378, 41.8068248626, 40.5463120438, 55.6409038531, 
    54.7201927527, 40.9656544833, 39.5744515254, 48.5457722338, 
    47.238670909)), .Names = c("Sample", "observation1", "observation2", 
"observation3", "percentage"), class = "data.frame", row.names = c(NA, 
-12L))

# install.packages("reshape2", dependencies = TRUE)
require(reshape2)

    data1.long <- melt(df, id=c("Sample"), measure.var = c("observation1", "observation2", "observation3"))


data1.long$percentage <- paste(round(data1.long$percentage, 2), "%", sep="")
data1.long[data1.long$variable == "observation1" | data1.long$variable == "observation2" ,2] <- ""

ggplot(data1.long, aes(x = Sample, y = value, fill=variable)) + 
       geom_bar(, stat="identity", width=0.5, position="dodge") + 
       geom_text(aes(label = percentage), vjust=2.10, size=2, hjust=-.06, angle = 90)
Community
  • 1
  • 1
Eric Fail
  • 8,191
  • 8
  • 72
  • 128