0

I have a test dataset like this:

df_test <- data.frame(
  proj_manager = c('Emma','Emma','Emma','Emma','Emma','Alice','Alice'),
  proj_ID = c(1, 2, 3, 4, 5, 6, 7), 
  stage = c('B','B','B','A','C','A','C'),
  value = c(15,15,20,20,20,70,5)
)

Preparation for viz:

input <- select(df_test, proj_manager, proj_ID, stage, value) %>%
  filter(proj_manager=='Emma') %>%
  do({
    proj_value_by_manager = sum(distinct(., proj_ID, value)$value);
    mutate(., proj_value_by_manager =  proj_value_by_manager)
  }) %>%
  group_by(stage) %>%
 do({
    sum_value_byStage = sum(distinct(.,proj_ID,value)$value);
    mutate(.,sum_value_byStage= sum_value_byStage)
  }) %>%
  mutate(count_proj = length(unique(proj_ID))) 

commapos <- function(x, ...) {
  format(abs(x), big.mark = ",", trim = TRUE,
  scientific = FALSE, ...) }

Visualization:

ggplot (input, aes(x=stage, y = count_proj)) + 
  geom_bar(stat = 'identity')+
  geom_bar(aes(y=-proj_value_by_manager), 
      stat = "identity", fill = "Blue") + 
  scale_y_continuous(labels = commapos)+
  coord_flip() +
  ylab('') +
  geom_text(aes(label= sum_value_byStage), hjust = 5) +
  geom_text(aes(label= count_proj), hjust = -1) +
  labs(title = "Emma: 4 projects| $90M Values \n   \n Commitment|Projects") +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_hline(yintercept = 0, linetype =1)

enter image description here

My questions are:

  1. Why is the y-values not showing up right? e.g. C is labeled 20, but nearing hitting 100 on the scale.
  2. How to adjust the position of labels so that it sits on the top of its bar?
  3. How to re-scale the y axis so that both the very short bar of 'count of project' and long bar of 'Project value' can be well displayed?

Thank you all for the help!

Daisywang
  • 287
  • 3
  • 17

1 Answers1

0

I think your issues are coming from the fact that:

(1) Your dataset has duplicated values. This causes geom_bar to add all of them together. For example there are 3 obs for B where proj_value_by_manager = 90 which is why the blue bar extends to 270 for that group (they all get added).

(2) in your second geom_bar you use y = -proj_value_by_manager but in the geom_text to label this you use sum_value_byStage. That's why the blue bar for A is extending to 90 (since proj_value_by_manager is 90) but the label reads 20.

To get you what I believe the chart you want is you could do:

#Q1: No dupe dataset so it doesnt erroneous add columns
input2 <- input[!duplicated(input[,-c(2,4)]),]

ggplot (input2, aes(x=stage, y = count_proj)) + 
  geom_bar(stat = 'identity')+
  geom_bar(aes(y=-sum_value_byStage), #Q1: changed so this y-value matches your label
           stat = "identity", fill = "Blue") + 
  scale_y_continuous(labels = commapos)+
  coord_flip() +
  ylab('') +
  geom_text(aes(label= sum_value_byStage, y = -sum_value_byStage), hjust = 1) + #Q2: Added in y-value for label and hjust so it will be on top
  geom_text(aes(label= count_proj), hjust = -1) +
  labs(title = "Emma: 4 projects| $90M Values \n   \n Commitment|Projects") +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_hline(yintercept = 0, linetype =1)

For your last question, there is no good way to display both of these. One option would be to rescale the small data and still label it with a 1 or 3. However, I didn't do this because once you scale down the blue bars the other bars look OK to me.

enter image description here

Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • Thank you for your solution! It helps. For the third question, because my real data have projects with values over 2000, that could make the count of project extremely small. – Daisywang May 01 '17 at 18:39
  • @Daisywang there's no easy way to adjust this, see http://stackoverflow.com/questions/3099219/plot-with-2-y-axes-one-y-axis-on-the-left-and-another-y-axis-on-the-right. My suggestion would be rescale your data or you could do a separate plot right next to the blue bars that has the count of project on it. – Mike H. May 01 '17 at 19:18