0

I'm new to rstudio and also a little bit of statistics.

I was really lost in the dbplyr and ggplot tutorials, since most speak only of numerical data, which is a little different from my case.

I have a huge data set and would like to perform a statistical analysis of them, as they are text and date data. I will post a sample of my data below.

I have the column "ORIGIN" it has two values ​​(A and B) and I would like to perform an automatic count of these values ​​and plot on the same graph. where I need to know the total order of A and B separating the Received in 2019 and 2020.

My csv sample

ORIGIN;RECEIVEMENT;DELIVERY
A;01-01-2019;12-01-2019
B;01-03-2019;13-03-2019
A;31-12-2019;11-01-2020
A;21-02-2020;04-03-2020
A;08-09-2020;19-09-2020
A;28-01-2020;09-02-2020
A;02-03-2019;13-03-2019
B;04-06-2020;16-06-2020
A;24-07-2019;04-08-2019
B;03-05-2020;15-05-2020
B;08-08-2019;19-08-2019
B;03-08-2020;14-08-2020
A;20-03-2019;31-03-2019

edit: i remove colunn total

Desculpe-me eu uso o Google Tradutor, agradeço sua ajuda grande. Me expressei mal, por favor desconsidere a coluna TOTAL, eu postei uma amostra do meu problema.

Eu gostaria de contar quantos A e B foram entregues em 2019 e 2020.

Phil
  • 7,287
  • 3
  • 36
  • 66
wesleysc352
  • 579
  • 1
  • 8
  • 21
  • Welcome to SO! Your question is too broad and you don't show what you've tried so far. Also, great that you include data, but please use `dput` so it's easier to copy it. Have a look how to make [minimal reproducible examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – starja Sep 20 '20 at 19:06
  • in general, you can use dplyr to do the counting: `data %>% group_by(ORIGIN) %>% summarise(count = n())` – starja Sep 20 '20 at 19:08
  • desculpe-me eu uso o Google Tradutor. I will try your answer. – wesleysc352 Sep 20 '20 at 19:39
  • Have a look at https://pt.stackoverflow.com/ :) – starja Sep 20 '20 at 19:42

1 Answers1

1

It's difficult to know exactly what you are looking for from your description. I'm guessing you want a count of orders by origin and year:

library(dplyr)
library(ggplot2)

df %>% 
  mutate(across(2:3, function(x) as.POSIXct(strptime(x, "%d-%m-%Y")))) %>%
  mutate(year = factor(lubridate::year(RECEIVEMENT))) %>%
  group_by(ORIGIN, year) %>%
  summarize(count = n()) %>%
  ggplot(aes(year, count)) +
  geom_col(aes(fill = ORIGIN), colour = "black", width = 0.5,
           position = position_dodge()) +
  scale_fill_manual(values = c("gold", "deepskyblue4")) +
  theme_bw() +
  labs(title = "Pedidos Anuais", x = "Ano", y = "Contagem")

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87