0

I'm trying to make barplot with ggplot.

So I have several dataframe (example data below).

In these dataframe I have a column "count". But I have a lot of count==0.

So I'm trying to make a barplot of my data, exclude 0 in visualization, but keep the original percentage.

For example if I have 80% of 0 in my data I want to plot only the value!=0 but keep in Y label 20% (like that I can easily visualize my data and keep information about 0 value). If you have better suggestion to represent my data I'm open to suggestion.

Another of my problem is that I want to merge some groups of "count". Meaning that I want in my plot count=1,count=2,count>=3 and I don't know how to get that. I was thinking maybe make a count matrix?

Here data example:

#Stackoverflow example
data1=data.frame(count=c(rep(0,70),rep(1,15),rep(2,10),rep(3,3),5,7))
data2=data.frame(count=c(rep(0,140),rep(1,30),rep(2,20),rep(3,6),5,5,7,7))
data3=data.frame(count=c(rep(0,120),rep(1,20),rep(2,7),5,7,9))

data1$var="first"
data2$var="second"
data3$var="third"

all_df=rbind(data1,data2,data3)

#Plot all values : Plot 1
ggplot(all_df) +
geom_bar(aes(x = var, fill = as.factor(count)), position = "fill")+
scale_y_continuous(labels=scales::percent)


#Plot value greater than 0 : Plot 2
ggplot(all_df[which(all_df$count>0),]) +
geom_bar(aes(x = var, fill = as.factor(count)), position = "fill")+
scale_y_continuous(labels=scales::percent)

So here it's what I got with all the data all_value

And so here it's what I tried to exclude 0 but I don't know how keep the information about the 0 missing value (80% of the data). So instead to have 100% on the Y top label, I'm trying to get (1-(% count==0)) enter image description here

And also group the count>=3 so instead to have all in the legend : 1,2,3,5,7,9. I want 1,2,>=3

To do that I was thinking to make a count table in new dataframe. So in my data make the sum of count=0,count=1,count=2,count>=3, do it for all the different dataframe but then... I don't know... Example of what I tried below.

count_df=function(a,b,c){
data.frame(first=c(sum(a$count==0),sum(a$count==1),sum(a$count==2),sum(a$count>=3)),
second=c(sum(b$count==0),sum(b$count==1),sum(b$count==2),sum(b$count>=3)),
third=c(sum(c$count==0),sum(c$count==1),sum(c$count==2),sum(c$count>=3)))
}

count_table=count_df(data1,data2,data3)
rownames(count_table)=c("0","1","2","=<3")
Nono_sad
  • 433
  • 4
  • 16

1 Answers1

3

You could set the color of the zero count to transparent. This way you do not need to change your data.frame at all.

Using the handy gg_color_hue-function found here you can then do this:

gg_color_hue <- function(n) {
  hues = seq(15, 375, length = n + 1)
  hcl(h = hues, l = 65, c = 100)[1:n]
}

counts <- unique(all_df$count)
counts <- counts[-which(counts==0)]
colors <- c('transparent', gg_color_hue(length(counts)))

#Plot all values : Plot 1
ggplot(all_df) +
  geom_bar(aes(x = var, fill = as.factor(count)), position = "fill")+
  scale_y_continuous(labels=scales::percent) + 
  scale_fill_manual(values=colors, breaks=counts)

enter image description here

Simon
  • 577
  • 3
  • 9