1

I want to create a specific barplot with ggplot. So far so good, here is what I've got so far:

ggplot(only_savings, aes(DivisionName,  Total_CR)) +
geom_bar(stat="summary", fun.y="sum")

Total_CR on Y with 1 bar

As you can see - there are 2 Divisions: Electrification Products and Power Grinds. On the Y-Axis we have numeric Savings that are summed up (Total_CR - total cost reduction). BUT, I would like to SPLIT the Bar in 2 more parts: Repetitive_Savings and MDF_Savings. So it would look like this:

Total_CR on Y with divided Bars

And here is the data: (Ok, I can't post a screenshot, so I'll paste some rows)

DivisionName                Repetitive_Savings       MDF_Savings    Total_CR
Power Grids                 86.571656                0              86.571656
Power Grids                 183.461221               0              183.461221
Power Grids                 2326.963118              0              2326.963118
Electrification Products    1249.323277              0              1249.323277
Electrification Products    6.849336                 0              6.849336
Electrification Products    3.808845                 0              3.808846

DivisionName is a factor, the other 3 are numeric Values. How can I achieve the Barplots that I've sketched in paint?

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205

2 Answers2

0

Read in data

I changed your example a little, since values of 0 aren't going to show anything for us.

only_savings <- read.table(header = TRUE, text = "
DivisionName                Repetitive_Savings       MDF_Savings    Total_CR
'Power Grids'                 86.571656                500              86.571656
'Power Grids'                 183.461221               500              183.461221
'Power Grids'                 2326.963118              500              2326.963118
'Electrification Products'    1249.323277              500              1249.323277
'Electrification Products'    6.849336                 500              6.849336
'Electrification Products'    3.808845                 500              3.808846
")

Reshape

ggplot requires things to be in long form, or 'tidy' form, which means that each observation should be seperate row, which an additional column telling use whether that row belongs to Repetitive or MDF. One easy way to do that is with the tidyr package.

We'll have to filter out all the rows with Total though, since they aren't needed to be plotted.

library(tidyr)
pd <- gather(only_savings, 'key', 'value', -DivisionName)
pd <- pd[pd$key != 'Total_CR', ]

Create the plot

Now all that is left to do is to assign a fill colour to key.

library(ggplot2)
ggplot(pd, aes(DivisionName,  value, fill = key)) +
  geom_bar(stat = "summary", fun.y = "sum")

Note that we can also write it as follows, where the stacking of the observations is the same as summing them first.

ggplot(pd, aes(DivisionName,  value, fill = key)) +
  geom_bar(stat = "identity")

Result

enter image description here

Community
  • 1
  • 1
Axeman
  • 32,068
  • 8
  • 81
  • 94
  • 1
    Total_CR is the sum of MDF_Savings + Repetitive_Savings. So I'm not sure if only the Legend is wrong or also the meaning behind it. Since Total_CR should be the whole Bar and the 2 splits should be MDF_Savings and Repetitive_Savings. – Pixelements Aug 19 '16 at 12:20
  • @Pixelements Sorry, filtered out the wrong one. I've amended the error. – Axeman Aug 19 '16 at 12:24
  • Great! Could think about it by myself. Is it still possible to add % to the bars? Let's say it would be 55% MDF_Savings and 45% Repetitive_Savings on the left and 30%, 70% on the right? Since the key is a character in pd? If yes, how? – Pixelements Aug 19 '16 at 12:32
  • You can't easily do double axes in `ggplot`. See for example [here](http://stackoverflow.com/questions/3695497/ggplot-showing-instead-of-counts-in-charts-of-categorical-variables) for percentages. – Axeman Aug 19 '16 at 12:35
  • Here I am, again.. It does not seem to work. Can I handle you some Values, so you could test it with the "real" data? It does execute it everything and plots a Bar, but... The values doesn't seem to be true. For instance, the Total of Repetitive_Savings in the data is roughly 11.000.000 where as MDF-Savings are 4.000.000. So the Total Savings shouldnt be more than 15, but it is in the bar.. Could you imagine why? – Pixelements Sep 21 '16 at 10:23
  • @Pixelements. There is little I can do here. Please post a _reproducible_ example as a new question. – Axeman Sep 21 '16 at 10:40
  • I've posted a new question: http://stackoverflow.com/questions/39619490/rstudio-reorder-stacked-ggplot-geom-bar – Pixelements Sep 21 '16 at 14:32
0

What you want to do is introduce a categorical parameter where the two fields Repetitive_Savings and MDF_Savings are groups.

So your data is not formatted correctly as it is.

You can reformat it with

test.df<- diamonds[1:100,]
test.df <- test.df[,c(2,5,6)]
test.df$total <-test.df[,2]+test.df[,3]
head(test.df)
        cut depth table total
1     Ideal  61.5    55 116.5
2   Premium  59.8    61 120.8
3      Good  56.9    65 121.9
4   Premium  62.4    58 120.4
5      Good  63.3    58 121.3
6 Very Good  62.8    57 119.8

Colnames<-colnames(test.df)

NewData.list<-lapply(1:nrow(test.df),function(x){
    Row<-test.df[x,]
    data.frame(DivisionName=Row[,1],Values=c(Row[,2],Row[,3],Row[,4]),Categories=Colnames[c(2,3,4)])

})

NewData.df <- do.call(rbind,NewData.list)

  DivisionName Values Categories
1        Ideal   61.5      depth
2        Ideal   55.0      table
3        Ideal  116.5      total
4      Premium   59.8      depth
5      Premium   61.0      table
6      Premium  120.8      total

Then plot

NewData.df$Categories<- factor(NewData.df$Categories,levels=unique(NewData.df$Categories))
NewData.df <- NewData.df[order(NewData.df$Categories),]

Plot<-ggplot(NewData.df, aes(x=DivisionName,  y=Values,group=Categories)) + geom_bar(stat="identity",aes(fill=Categories),colour="#000000")

ggsave(file="Test.png",Plot)

enter image description here

FoldedChromatin
  • 217
  • 1
  • 4
  • 12