Plot a dataframe with different values and categories on quarterly basis in r as a histogram

Question

I have a problem with my dataset and I would like to create a histogram showing the distribution of observations per quarter. The observations are divided into 3 categories and should be displayed per quarter. Unfortunately, I have not yet found out how it works. With a "normal" plot I have added the respective observation points via the command "points", but I don't quite know how this works with histograms. Any help is greatly appreciated!

This is a sample of my data (in real there are about 20 years of observations with its categories):

            date    n           Categorie1 Categorie2 Categorie3
         2015 Q1    67                5          2          1
         2015 Q2    71                3          4          2
         2015 Q3    69                2         10         11
         2015 Q4    62                1          0          0 
         2016 Q1    69                2          2          1
         2016 Q2    61                3          5          0
         2016 Q3    63                3          2          7

The variable "date" is in the format 'yearqtr'.

I have used the following code to produce the normal plot:

> plot(df23$date,df23$Categorie1 , col = "red", pch = 16, xlab="Quarter",
> ylab="Occurencies")
>     points(df23$date,df23$Categorie2 , col = "blue", pch = 16)
>     points(df23$date,df23$Categorie3 , col = "green", pch = 16)

So I get a graph, but I don't know how to get this histogram for the different categories per quarter.

Can you please help me to create the histogram for the different categories and the different quarters?

Many thanks in advance!

This is my dput(df23):

>  dput(df23)
structure(list(date = structure(c(2015, 2015.25, 2015.5, 2015.75, 
2016, 2016.25, 2016.5), class = "yearqtr"), n = c(67, 71, 69, 
62, 69, 61, 63), Categorie1 = c(5, 3, 2, 1, 2, 3, 3), Categorie2 = c(2, 
4, 10, 0, 2, 5, 2), Categorie3 = c(1, 2, 11, 0, 1, 0, 7)), row.names = c(NA, 
7L), class = "data.frame")

Could you please paste your data into the question using `dput(df23)` to make it easily reproducible? [Guidance for making a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) Do you need a base R graphics solution? — Peter, Aug 28 '21 at 09:19

score 3 · Answer 1 · answered Aug 28 '21 at 09:32

Using a tidyverse approach we could do

library(ggplot2)
library(tidyr)
library(dplyr)
library(zoo) # format "yearqtr"


df23 %>% 
  pivot_longer(
    starts_with("Categorie"), 
    names_to = "Categorie",
    names_pattern = "Categorie(\\d+)"
  ) %>% 
  ggplot(aes(x = date, y = value, fill = Categorie)) + 
  geom_bar(position = "dodge", stat = "identity")

to get

Data

df23 <- structure(list(date = structure(c(2015, 2015.25, 2015.5, 2015.75, 
2016, 2016.25, 2016.5), class = "yearqtr"), n = c(67, 71, 69, 
62, 69, 61, 63), Categorie1 = c(5, 3, 2, 1, 2, 3, 3), Categorie2 = c(2, 
4, 10, 0, 2, 5, 2), Categorie3 = c(1, 2, 11, 0, 1, 0, 7)), row.names = c(NA, 
-7L), class = c("tbl_df", "tbl", "data.frame"))

Thanks! How would you adjust the code, if the categories have different names? So for example categorie 1 is named "daimler", categorie 2 is named "opel" and categorie 3 is named "renault"? — Beginner_in_R, Aug 28 '21 at 09:40
If your categories are named differently, replace the ´pivot_longer` statement by `pivot_longer(-c(date,n), names_to = "Categorie")`. — Martin Gal, Aug 28 '21 at 09:43

score 3 · Accepted Answer · answered Aug 28 '21 at 09:56

3

Martin Gal's approach is very good. Here is an alternative way:

Bring you data in long format with pivot_longer from tidyr package is in tidyverse
As your date column is already fine, you could use it as.factor without date format (otherwise see Martin Gal's solution)
You could also use geom_col

library(tidyverse)

df23 %>% 
    pivot_longer(
        cols = -c(date, n),
        names_to = "Categorie"
    ) %>% 
    ggplot(aes(x = factor(date), y = value, fill = Categorie)) + 
    geom_col(position = "dodge")

answered Aug 28 '21 at 09:56

TarJae

72,363
6
19
66

1

When factoring, take care for the correct order of the dates. +1 – Martin Gal Aug 28 '21 at 11:26
Thank you very much! That worked very well! Is it possible to rename the axes in this histogram? So e.g. call the x-axis "Quarter" and the y-axis "Occurencies"? – Beginner_in_R Aug 28 '21 at 15:49
1

Add `+ xlab("Quarter") + ylab("Occurencies")` – Martin Gal Aug 28 '21 at 15:53
Thank you! Do you also know how to adjust the colors of the histogram? So you can choose the colors for the different categories? – Beginner_in_R Aug 28 '21 at 15:59
1

add this line to the code `scale_fill_manual(values = c("red", "blue", "green"))` – TarJae Aug 28 '21 at 16:27

Plot a dataframe with different values and categories on quarterly basis in r as a histogram

2 Answers2

Data