0

I have a data.frame which looks more or less like this one but with 195780 observations:

structure(list(fecha = structure(c(17897, 17897, 17897, 17897, 
17897, 17897, 17897, 17897, 17897, 17897, 17897, 17897, 17897, 
17897, 17897, 18255, 18255, 18255, 18255, 18255, 18255, 18256, 
18256, 18256, 18256, 18256, 18256, 18256, 18256, 18256, 18256, 
18358, 18358, 18358, 18358, 18358, 18358, 18358, 18358, 18358, 
18358, 18358, 18358, 18358, 18358, 18358, 18358, 18900, 18900, 
18900, 18900, 18900, 18900, 18900, 18900, 18900, 18900, 18900
), class = "Date"), año = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L), .Label = c("2019", "2020", "2021"), class = "factor"), 
    mes = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
    12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 4L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 9L, 9L, 9L, 9L, 
    9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("Enero", "Febrero", 
    "Marzo", "Abril", "Mayo", "Junio", "Julio", "Agosto", "Septiembre", 
    "Octubre", "Noviembre", "Diciembre"), class = "factor"), 
    tipo_dia = c("F", "F", "F", "F", "F", "F", "F", "F", "F", 
    "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", 
    "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", 
    "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", 
    "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", 
    "L")), row.names = c(NA, -58L), class = c("tbl_df", "tbl", 
"data.frame"))

What I want is to obtain the count of tipo_dia for year and month but just the count of unique values i.e.: there are N "F" days in Enero 2019, K "L" days in Enero 2020, and so on.

I've tried

df%>%
  group_by(año,mes,tipo_dia)%>%
  summarise(tipo_dia_conteo=sum(unique(tipo_dia)))

Which gives me this error:

Error: Problem with `summarise()` column `tipo_dia_conteo`.
i `tipo_dia_conteo = sum(unique(tipo_dia))`

And also:

df%>%
  group_by(año,mes,tipo_dia)%>%
  summarise(tipo_dia_conteo=length(unique(tipo_dia)))

Which gives me a result that I clearly don't want: (Note that here appears some "S" because I'm working with my full database)

# A tibble: 121 x 4
# Groups:   año, mes [33]
   año   mes     tipo_dia tipo_dia_conteo
   <fct> <fct>   <chr>              <int>
 1 2019  Enero   D                      1
 2 2019  Enero   F                      1
 3 2019  Enero   L                      1
 4 2019  Enero   S                      1
 5 2019  Febrero D                      1
 6 2019  Febrero F                      1
 7 2019  Febrero L                      1
 8 2019  Febrero S                      1
 9 2019  Marzo   D                      1
10 2019  Marzo   F                      1

So basically what I expect is the number of "D" days for Enero 2019, "L" and "S" days for same month and so on to September 2021. I feel like kinda stuck so any help will be much appreciated.

Alejandro Carrera
  • 513
  • 1
  • 4
  • 14
  • Don't group by the column you are trying to count. `df %>% group_by(año, mes) %>% summarise(tipo_dia_conteo=n_distinct(tipo_dia))`. Otherwise there will only be that one value in each of the groups. – MrFlick Oct 22 '21 at 22:08
  • I've already tried that but didn't work. The table above is the closest thing to the output I need. – Alejandro Carrera Oct 22 '21 at 22:11
  • Is this what you wanted? `df %>% group_by(año, mes) %>% count(tipo_dia, name = "tipo_dia_conteo")` It would be helpful to give the desired output for the specific sample input you provided so possible solutions can be tested. – MrFlick Oct 22 '21 at 22:13
  • That worked but before I had to extract all unique values with duplicated. Thanks, man. – Alejandro Carrera Oct 22 '21 at 22:51
  • Or with only `count` - `df %>% count(fecha, año, tipo_dia)` – Ronak Shah Oct 23 '21 at 01:02

1 Answers1

0

Are you looking for:

library(tidyverse)
df %>%
  group_by(ano, mes) %>%
  count(tipo_dia)

which gives:

# A tibble: 5 x 4
# Groups:   ano, mes [4]
  ano   mes        tipo_dia     n
  <fct> <fct>      <chr>    <int>
1 2019  Enero      F           15
2 2019  Diciembre  F            6
3 2019  Diciembre  L           10
4 2020  Abril      L           16
5 2021  Septiembre L           11

Note that I had to rename the año column to ano in my example. Not sure if that‘s just an issue for me.

deschen
  • 10,012
  • 3
  • 27
  • 50