3

Currently we have struggled with the following for many, many hours before posting this question.

We have a huge number of data-sets similar to the following:

Income                 Inhabitants              Percent
Below  15000            Below 5000              4.664723
15000 - 3.000           Below 5000              15.743440
30000 - 40000           Below 5000              13.994169
40000 - 50000           Below 5000              12.609329
50000 - 60000           Below 5000              11.333819
60000 - 70000           Below 5000              11.370262
70000 - 100000          Below 5000              14.795918
Above  100000           Below 5000              5.211370
Do not know             Below 5000              10.276968
Below  15000            5000-20000              4.225146
15000 - 3.000           5000-20000              13.157895
30000 - 40000           5000-20000              12.733918
40000 - 50000           5000-20000              11.739766
60000 - 70000           5000-20000              11.315789
70000 - 100000          5000-20000              18.728070
Above  100000           5000-20000              7.880117
Do not know             5000-20000              9.356725
Below  15000            20000-110000            4.013588
15000 - 3.000           20000-110000            11.147458
30000 - 40000           20000-110000            11.927529
40000 - 50000           20000-110000            11.751384
50000 - 60000           20000-110000            9.738299
60000 - 70000           20000-110000            10.367388
70000 - 100000          20000-110000            17.929039
Above  100000           Above 110000            13.198289
Do not know             Above 110000            9.927026
Below  15000            Above 110000            4.662941
15000 - 3.000           Above 110000            10.286413
30000 - 40000           Above 110000            11.054838
40000 - 50000           Above 110000            10.513447
50000 - 60000           Above 110000            9.081383
60000 - 70000           Above 110000            8.539993
70000 - 100000          Above 110000            18.389801
Above  100000           Above 110000            18.040517
Do not know             Above 110000            9.430667`

We want to make stacked bars of the data, showing the distribution between areas.

This did it:

dg=ggplot(data=frame, aes(x=Inhabitants, ymax=100, y=Percent,fill=eval(parse(text=special))))       
g=g+geom_bar(stat="identity")
g=g+theme_minimal()
g=g+xlab("") + ylab("")
g=g+theme(axis.text.y=element_blank(),axis.ticks.y=element_blank(),axis.ticks.x=element_blank()) 
g=g+scale_fill_discrete("",guide = guide_legend(reverse=TRUE)) 
g

Nice, we are getting exactly what we want. We just want to add some information: How many percent does each section represent?

With the following code we are close:

g=g+geom_text(aes(label = paste(round(Percent,digits=1),"%"),y=Percent),size = 2,hjust = 0.4, vjust = 1.4, position ="stack") 

Getting this: http://s28.postimg.org/lv3zg2cnh/bars2.png

We just want to place the numbers in the middle of the sections. However, it turns out it is very difficult (for us) to do!

We have tried code like the following, with no luck.

data=transform(frame,pos=round(ave(Percent,Inhabitants,FUN=cumsum)-Percent/2))
g=ggplot(data, aes(x=Inhabitants, ymax=100, y=Percent, fill=eval(parse(text=special)))) 
g=g+geom_bar(stat="identity")
g=g+theme_minimal()
g=g+xlab("") + ylab("")
g=g+theme(axis.text.y=element_blank(),axis.ticks.y=element_blank(),axis.ticks.x=element_blank()) 
g=g+scale_fill_discrete("",guide = guide_legend(reverse=TRUE))
g=g+geom_text(aes(label = paste(round(Percent,digits=1),"%"),y=pos),size = 3,hjust = 0.4, vjust = 0, position ="stack") 
g

We have checked SO for solutions. Without luck, due to our inexperience with R. After many many hours we are now giving up and would be satisfied with our first solution, were it not for the fact that when we are handling data-sets with more sections it often turns into a mess: http://s13.postimg.org/5jxavvohz/bars3.png

Our primary question is:

1) How can we prevent labels with values less than 2 percent appearing.

(Our secondary question is: How can we get the values positioned in the middle? )

aosmith
  • 34,856
  • 9
  • 84
  • 118
  • In this answer, the values were centered http://stackoverflow.com/questions/25518741/how-to-plot-a-sophisticated-stacked-barplot-in-ggplot2-without-complicated-ma/25520472#25520472 – Pierre L Oct 02 '15 at 22:15

1 Answers1

4

To avoid labeling the stacks when Percent is less than some value, you can assign your positioning variable to NA for those cases.

For example, you could do this via ifelse and transform after creating the pos variable via cumsum as you did in your question. I am using 5 as the cut-off in this example, as no Percent in your example data is less than 2.

data = transform(data, pos2 = ifelse(Percent < 5, NA, pos))

Now just use pos2 as your y aesthetic in geom_text and you will not have text labels when Percent is less than 5. Remove position = "stack" from geom_text to get your labels centered.

Here is what things would look like with your example dataset (using fill = Income because I wasn't sure what fill = eval(parse(text = special)) was doing).

ggplot(data, aes(x = Inhabitants, y = Percent, fill = Income)) +
    geom_bar(stat="identity") +
    theme_minimal() +
    xlab("") + ylab("") +
    theme(axis.text.y = element_blank(),
          axis.ticks.y = element_blank(),
          axis.ticks.x = element_blank()) +
    scale_fill_discrete("",guide = guide_legend(reverse = TRUE)) +
    geom_text(aes(label = paste(round(Percent, digits = 1),"%"), y = pos2), size = 3) 

enter image description here

As @epi10 pointed out, another alternative is to use a blank label every time Percent is less than your cut off. You could do this by using your original position variable and using ifelse inside of geom_text. That line would then look like:

geom_text(aes(label = ifelse(Percent < 5, "", paste(round(Percent, digits = 1),"%")), y = pos), size = 3) 
aosmith
  • 34,856
  • 9
  • 84
  • 118
  • This solved our problems in an instant. We will run this code automatically through several datasets and 'special' is a variable variable-name. We use eval(parse to make R interpret the name as a variable. – user3493503 Oct 02 '15 at 23:20
  • 2
    Another way to approach this is to use an `ifelse` inside `geom_text`: `label=ifelse(Percent < 5, "", paste(round(Percent, 1), "%")` – eipi10 Oct 03 '15 at 01:59