1

For each year, I would like to create two new columns temp_count and rh_count counting the number of occurrences in each temp_catog and humidity_catog columns, respectively. This How to count how many values per level in a given factor? answers if you group by one variable, but I would like to use group_by(year, humidity_catog, temp_catog). Here is a screenshot of my data

enter image description here

I can a create a single column humidity_count to count number of occurrences in each category humidity_catog column using the following code.

df <- group_by(year, humidity_catog) %>%
  summarize(humidity_count = n())

Here is the output

enter image description here

But I would like to create another column temp_count in the same data frame to count the number of each category temp_count column. How can I achieve this? Here is the reproducible example of my data created via dput function.

df <- structure(
  list(
    year = structure(
      c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
        1L, 1L, 1L),
      .Label = c(
        "2006",
        "2007",
        "2012",
        "2013",
        "2014",
        "2014_c",
        "2015_a",
        "2015_b",
        "2016",
        "2017",
        "2020"
      ),
      class = "factor"
    ),
    min_rh = c(47.9, 49, 44.7, 40.2, 50, 52.3, 51.5, 82.8, 73.8,
               47.1),
    min_temp = c(12.4, 14.3, 15.1, 16.1, 12.7, 16.1, 14.4,
                 15.1, 11.8, 9.5),
    temp_catog = structure(
      c(2L, 2L, 3L, 3L,
        2L, 3L, 2L, 3L, 2L, 2L),
      .Label = c("T1(<=8)", "T2(>8, <=15)",
                 "T3(>15)"),
      class = "factor"
    ),
    humidity_catog = structure(
      c(1L,
        1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L),
      .Label = c("RH1(<=65)",
                 "RH2(>65)"),
      class = "factor"
    )
  ),
  class = c("grouped_df",
            "tbl_df", "tbl", "data.frame"),
  row.names = c(NA,-10L),
  groups = structure(
    list(
      year = structure(
        1L,
        .Label = c(
          "2006",
          "2007",
          "2012",
          "2013",
          "2014",
          "2014_c",
          "2015_a",
          "2015_b",
          "2016",
          "2017",
          "2020"
        ),
        class = "factor"
      ),
      .rows = structure(
        list(1:10),
        ptype = integer(0),
        class = c("vctrs_list_of",
                  "vctrs_vctr", "list")
      )
    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA,-1L),
    .drop = TRUE
  )
)

Note: I don't want unique occurrences. I just want to count how many times each category was recorded.

Ahsk
  • 241
  • 1
  • 7
  • 1
    use `count(...)` function or `tally` or `group_by(...)%>%mutate(n=n())` etc – Onyambu Mar 01 '23 at 23:25
  • @onyambu It's factor variable - those wont' work. I can create two different df using `df1 <- group_by(year, humidity_catog) %>% summarize(humidity_count = n())` and `df2 <- group_by(year, temp_catog) %>% summarize(temp_count = n())` and then combine them but I want an elegant solution – Ahsk Mar 02 '23 at 01:16

1 Answers1

1

Not so sure how the OP means to merge the two summarised results, but we may call mutate instead of summarise, sequentially, feeding the grouping variables to the .by argument.

obs: the toy dataframe was grouped by year, I ungrouped it beforehand

library(dplyr) #requires dplyr 1.1.0 for the .by solution

df %>%
    ungroup() %>%
    mutate(rh_count = n(), .by = c(year, humidity_catog)) %>%
    mutate(temp_count = n(), .by = c(year, temp_catog))

# A tibble: 10 × 7
   year  min_rh min_temp temp_catog   humidity_catog rh_count temp_count
   <fct>  <dbl>    <dbl> <fct>        <fct>             <int>      <int>
 1 2006    47.9     12.4 T2(>8, <=15) RH1(<=65)             8          6
 2 2006    49       14.3 T2(>8, <=15) RH1(<=65)             8          6
 3 2006    44.7     15.1 T3(>15)      RH1(<=65)             8          4
 4 2006    40.2     16.1 T3(>15)      RH1(<=65)             8          4
 5 2006    50       12.7 T2(>8, <=15) RH1(<=65)             8          6
 6 2006    52.3     16.1 T3(>15)      RH1(<=65)             8          4
 7 2006    51.5     14.4 T2(>8, <=15) RH1(<=65)             8          6
 8 2006    82.8     15.1 T3(>15)      RH2(>65)              2          4
 9 2006    73.8     11.8 T2(>8, <=15) RH2(>65)              2          6
10 2006    47.1      9.5 T2(>8, <=15) RH1(<=65)             8          6
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
  • Yes, joining df doesn't answer the question. Maybe it's not possible to get two summary statistics for counts. I can do that with mutate too. My point was to get a column like `humidity_count` in the screenshot. There are two `humidity_catog` <=65 and >65. So for 2006, I have only two counts. For temperature, I have three categories, so I should get only `three` counts for 2006 (each year). – Ahsk Mar 02 '23 at 12:38