1

I want to plot in R using ggplot or another package a bar chart showing values of multiple X variables per bar.

I would appreciate your help to do so and have attached a Figure by Akseer et al to show the graph I want to draw.enter image description here

Below I provide sample data to replicate this bar chart.

For the first two codes, the spacing and order of interventions and groups aim to reflect the categorization of the interventions as in the example Figure. This is because not all interventions are for everybody. Also, those values for groups (national medians) that are not part of a given intervention in Figure B will need to be dropped after the dataset is created.

Interventions<-c("Demand of family planning satisfied", ## interventions for 1s group

          "ANC 1+",                                 ## interventions for 2nd group
          "ANC 4+", 
          "ANC by skilled provider",
          "Protected against neonatal tetanus",

          "SBA",                                   ## interventions for 3rd group
          "Facility deliveries",

          "Early breastfeeding",                   ## interventions for 4th group

          "Exclusive breastfeeding at 6 months",   ## interventions for 5th group
          "Minimum meal frequency", 
          "BCG", 
          "Penta3", 
          "Measles",
          "Received vitamin A during the last 6 months",

          "Diarrhoea treatment (ORS)",             ## interventions for 6th group
          "Care seeking for pneumonia", 
          "Antibiotics for pneumonia", 

          "Improved drinking water sources",       ## interventions for 7th group
          "Improved sanitation facilities") 

Now I give the groups. Each bar in Figure B shows the national median for each intervention. These first 7 groups are national medians to draw those bars:

Prepregnancy<- (sample(1:100, 19, replace=TRUE)) ## 1st group

Pregnancy<-(sample(1:100, 19, replace=TRUE))   ## 2nd group

Birth<-(sample(1:100, 19, replace=TRUE))       ## 3rd group

Postnatal<-(sample(1:100, 19, replace=TRUE))   ## 4th group

Infancy<-sample(1:100, 19, replace=TRUE)       ## 5th group

Childhood<-sample(1:100, 19, replace=TRUE)     ## 6th group

Other<-sample(1:100, 19, replace=TRUE)        ## 7th group

Below I provide the last part of the data, that's, data for the group "provincial coverage". There is one consideration here: unlike the 7 groups above (national medians), all of these "provincial coverage" variables below apply for each of the 19 interventions as can be seen in Figure B.

Provincial1<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions
Provincial2<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions
Provincial3<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions
Provincial4<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions
Provincial5<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions 
Provincial6<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions 
Provincial7<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions 
Provincial8<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions 
Provincial9<-sample(1:100, 19, replace=TRUE)  ## provincial level observations for each of the 19 interventions 
Provincial10<-sample(1:100, 19, replace=TRUE) ## provincial level observations for each of the 19 interventions 


mydata_B<-data.frame(Interventions, Prepregnancy, Pregnancy, 
                 Birth, Postnatal, Infancy, Childhood, Other, 

                 Provincial1, Provincial2, Provincial3,
                 Provincial4, Provincial5,
                 Provincial6, Provincial7, Provincial8,
                 Provincial9, Provincial10)

rownames(mydata_B) <- mydata_B[,1]
dtFig3B <- mydata_B[,-1]

And, again, those values for groups (national medians) that are not part of a given intervention in Figure B will need to be dropped after the dataset is created.

I would appreciate any ideas on how to reproduce this bar chart in R.

Krantz
  • 1,424
  • 1
  • 12
  • 31

3 Answers3

2

This example illustrates how you can use factor(x, levels) to make sure bars in the same group are placed together. In the ggplot call, you can map the grouping variable to the fill aesthetic to visually seperate the groups. Use stat = "unique" to take unique value instead of making a count (where the height of each bar is determined by the number of corresponding rows in df).

library(ggplot2)

df <- data.frame(x = rep(c("Z", "A", "Y", "B", "X"), each = 5), 
                 value = sample(10:99, 25))

# divide into groups
groups <- c(Z = "g1", A = "g3", Y = "g3", B = "g1", X = "g2")
df$group <- groups[as.character(df$x)]

# set the order of group
df$group <- factor(df$group, c("g1", "g2", "g3"))

# order df by group
df <- df[order(df$group), ]

# reset the order of x accordingly
df$x <- factor(df$x, unique(df$x))

# calculate medians
medians <- tapply(df$value, df$x, median)
df$median <- medians[as.character(df$x)]

# plot, mapping group to fill aesthetic
ggplot(df, aes(x, fill = group)) +
  geom_bar(aes(y = median), stat = "unique") +
  geom_point(aes(y = value)) + 
  labs(y = "values and median")
Jordi
  • 1,313
  • 8
  • 13
  • This works to draw part of the graph, but I am still unable to reproduce the example. There are other parts remaining. For example, how can I get those values of provincial coverage plotted in each bar? Also, how can I add these vertical lines separating the groups? Thank you! – Krantz Mar 02 '18 at 12:37
  • @Krantz I have extended the example to include the data points as a scatter plot and the medians as bars. As far as I know, `ggplot` has no straightforward way to add vertical lines in between (groups of) bars. – Jordi Mar 02 '18 at 13:13
  • Terrific! How about giving a different colour to each province to make the data points more meaningful? Thanks a lot for your input, @Jorid. – Krantz Mar 02 '18 at 14:57
  • @Krantz I would not recommend coloring the data points, because they might clash with the bar colors. To amplify the visual contrast, you can try using `aes(y = value, shape = x)` in the `geom_point` call, although in my opinion the mapping to the horizontal axis is sufficient. – Jordi Mar 02 '18 at 15:05
  • I see your point. I did `aes(y = value, shape = x)` but I suspect we need to make provinces different from x because what we need is to be able to contrast between values (coverage) of different provinces within a given category of x (intervention). The current situation allows contrasting the shapes between x (interventions: "Z", "A", "Y", "B", "X"), which we have accomplished already with the colours (groups: "g1", "g2", "g3") and the bars (medians). In short, we need the contrasts between provinces to happen within, not between, the levels of x. Thanks in advance! – Krantz Mar 02 '18 at 15:38
  • Assuming you have a column `province` in `df`, that gives the province for every row, you can use `aes(y = value, shape = province)`. – Jordi Mar 02 '18 at 15:44
  • `df` has a column for each province, and also a column for each variable group, and a row for each `intervention` ("Z", "A", "Y", "B", "X"). Each variable group (`g1`, `g2`, and `g3`) has the value of the national median for each intervention, and each variable province (`prov1`, `prov2`, `prov3`, `prov4`, `prov5`) has the value for each intervention for the respective province. – Krantz Mar 02 '18 at 16:04
1

This shows how to put lines between groups based on this answer. This is an extension to the answer from @Jordi above. Revised to color the province points and using alpha on the bars. The 19 provinces will be really hard to tell by color so some use of shape may be needed as noted in other comments.

library(ggplot2)

# make data
df = read.csv(text='
group,intervention,province,value
g1,i1,p1,10
g1,i1,p2,12
g1,i2,p1,13
g1,i2,p2,15
g2,i3,p1,18
g2,i3,p2,20
g3,i4,p1,14
g3,i4,p2,16
g3,i5,p1,18
g3,i5,p2,20
', stringsAsFactors = FALSE)

# define ordered factors to ontrol plot orders
df$group = ordered(df$group, levels = c("g3", "g2", "g1")) ## deliberately reversed 
df$intervention = ordered(df$intervention, levels = c("i1", "i2", "i3", "i4", "i5"))

# find the last intervention in each group
library(dplyr)
last_in_group = df  %>%
  group_by(group, intervention) %>%
  summarize() %>%
  group_by(group) %>%
  summarize(x = as.integer(tail(intervention,1)) + .5 ) 

# calculate medians
medians <- tapply(df$value, df$intervention, median)
df$median <- medians[as.character(df$intervention)]

# plot, mapping group to fill aesthetic
ggplot(df, aes(x = intervention, fill = group)) +
  geom_col(aes(y = median, fill = group), width = 0.3, alpha=0.2) +
  geom_point(aes(y = value, col=province)) + 
  geom_vline(xintercept = last_in_group$x, lwd = 0.5, linetype=2, alpha = 0.2) +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "values and median") +
  theme(panel.background = element_rect(fill = "white"))

enter image description here

Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
  • epi99@, that could work. Could you help with the issue of contrasting the points within each bar and also creating a different (separate) legend for the data points? Thanks in advance. – Krantz Mar 03 '18 at 00:25
1

This is perhaps are more natural ggplot approach, using facet_grid to produce a single row, with scales = 'free_x' to only include used x values, and space = 'free' to adjust the width of each panel to fit. Additional adjustment of the theme could get close to the desired presentation.

This follows the data structure and example from @Jordi

# plot, mapping group to fill aesthetic
ggplot(df, aes(x, fill = group)) +
  geom_bar(aes(y = median), stat = "unique", width= 0.3) +
  geom_point(aes(y = value)) + 
  labs(y = "values and median") +
  facet_grid(. ~ group, scales = "free_x", space = "free") 

enter image description here

Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19