0

I am pretty new to ggplot2 and I would like to draw a histogram of the number of articles published per year (or 5 years) for a systematic review. I have a df like that:

Df <- data.frame(   name = c("article1", "article2", "article3", "article4"),    
date = c(2004, 2009, 1999, 2007),   
question1 = c(1,0,1,0),   
question2 = c(1,1,1,1),   
question3 = c(1,1,1,1),  
 question4 = c(0,0,0,0),   
question5 = c(1,0,1,0), stringsAsFactors = FALSE ) 

ggplot(Df, aes (date))+   
geom_histogram(binwidth = 5, color= "black")

Plus, for each bar of the histogram, I would like to fill the bars with the number of articles that covered a particular type of question (question 1 to 5, coded 1 or 0 depending on if the question is present or absent).The thing is I have 5 questions I would like to make visible in one diagram. And I don't know how to do that... I tried the fill argument and to do it with a geom_bar but failed.

Thanks so much in advance for your help

  • Images are not a good way for posting data (or code). See [this Meta](https://meta.stackoverflow.com/a/285557/8245406) and a [relevant xkcd](https://xkcd.com/2116/). Can you post sample data in `dput` format? Please edit **the question** with the code you've tried and with the output of `dput(df)`. Or, if it is too big with the output of `dput(head(df, 20))`. (Note: `df` is the name of your dataset.) – Rui Barradas Apr 21 '21 at 09:37
  • So sorry, better now? – Laudine Carbuccia Apr 21 '21 at 09:53

1 Answers1

2

Here is a way. It's a simple bar plot with ggplot.
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

library(dplyr)
library(tidyr)
library(ggplot2)

t %>%
  select(-Code) %>%
  pivot_longer(
    cols = starts_with("Question"),
    names_to = "Question"
  ) %>%
  mutate(Publication_date = factor(Publication_date)) %>%
  ggplot(aes(Publication_date, fill = Question)) +
  geom_bar() +
  xlab("Publication Date")

enter image description here

Test data

set.seed(2021)
n <- 200
Code <- paste0("Article", 1:n)
Publication_date <- sample(2000:2020, n, TRUE)
Question <- replicate(5, rbinom(n, 1, 0.5))
colnames(Question) <- paste0("Question", 1:5)

t <- data.frame(Code, Publication_date)
t <- cbind(t, Question)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66