0

I am having trouble counting a categorical column. Thanks in advance!

So I have this dataframe:

Employee ID  Station
1001         Produce
1002         Pharmacy
1001         Frozen
1003

I want to add a column to the data frame that counts how many stations each employee had.

Example of output:

Employee ID      Station  Count
    1001         Produce  2
    1002         Pharmacy 1
    1001         Frozen   2
    1003                  0
lioness22
  • 5
  • 3

1 Answers1

0

There is the function count from the package dplyr just for that:

library(tidyverse)

data <- tribble(
  ~Employee.ID, ~Station,
  1001L, "Produce",
  1002L, "Pharmacy",
  1001L, "Frozen",
  1003L, NA
)

counts <-
  data %>%
  filter(!is.na(Station)) %>%
  count(Employee.ID)

counts
#> # A tibble: 2 x 2
#>   Employee.ID     n
#>         <int> <int>
#> 1        1001     2
#> 2        1002     1

data %>%
  left_join(counts) %>%
  mutate(n = n %>% replace_na(0))
#> Joining, by = "Employee.ID"
#> # A tibble: 4 x 3
#>   Employee.ID Station      n
#>         <int> <chr>    <dbl>
#> 1        1001 Produce      2
#> 2        1002 Pharmacy     1
#> 3        1001 Frozen       2
#> 4        1003 <NA>         0

Created on 2021-12-08 by the reprex package (v2.0.1)

danlooo
  • 10,067
  • 2
  • 8
  • 22
  • Hi, thank you for responding. That counts the **Employee ID** column, but I basically want to count the **Station** column – lioness22 Dec 08 '21 at 11:55
  • My answer creates the desired example output and counts the number of unique rows per employee. Since one employee can have been at multiple stattions, n represents the number of stations per employee. – danlooo Dec 08 '21 at 12:07
  • thank you! My data frame had the last row as a blank, so it did not work until I replaced it with NA sorry! – lioness22 Dec 10 '21 at 03:03