0

I have data on 30 people that includes ethnicity, gender, school type, whether they received free school meals, etc.

I want to produce frequency counts for all of these features. Currently my code looks like this:

df <- read.csv("~file")
df %>% select(Ethnicity) %>% group_by(Ethnicity) %>% summarise(freq = n())
df %>% select(Gender) %>% group_by(Gender) %>% summarise(freq = n())
df %>% select(School.type) %>% group_by(School.type) %>% summarise(freq = n())

Is there a way I can create a frequency tibble for 8 columns (e.g. ethnicity, gender, school type, etc.) in a more efficient way (e.g. 1 or 2 lines of code)?

As an example output for the ethnicity code:

# A tibble: 13 × 2
   Ethnicity                             freq
   <chr>                                <int>
 1 Asian or Asian British - Bangladeshi     1
 2 Asian or Asian British - Indian          7
 3 Asian or Asian British - Pakistani       1
 4 Black or Black British - African         5
 5 Black or Black British - Caribbean       2
 6 Chinese                                  3
 7 Mixed - White and Asian                  2
 8 Mixed - White and Black African          1
 9 Mixed - White and Black Caribbean        1
10 Not known/ prefer not to say             1
11 White British                           27
12 White Irish                              1
13 White Other                              5

And for gender:

# A tibble: 2 × 2
  Gender  freq
  <chr>  <int>
1 Female    36
2 Male      21

NB: some columns also contain data on postcode & name which I obviously don't want to perform the frequency function on, so I think I'll somehow need to select just the columns I want to perform this function on

Jess
  • 11
  • 2
  • Can you describe how you want your table to look? If your variables have different numbers of categories then each of your columns in your big table will have different lengths? – George Savva Jan 30 '23 at 11:32
  • Please include a minimal example, e.g. with `dput(head(dat))`, and ideally an expected output. – Andre Wildberg Jan 30 '23 at 11:44
  • I'm happy with a table for each variable (which I assume is necessary since they have different lengths - like ethnicity has 13 categories, while gender only has 2). – Jess Jan 30 '23 at 12:00

1 Answers1

0

One option would be to use lapply to loop over a vector of your desired columns and dplyr::count for the frequency table.

Using the starwars dataset as example data:

library(dplyr, warn = FALSE)

cols <- c("hair_color", "sex")

lapply(cols, function(x) {
  count(starwars, .data[[x]], name = "freq")
})
#> [[1]]
#> # A tibble: 13 × 2
#>    hair_color     freq
#>    <chr>         <int>
#>  1 auburn            1
#>  2 auburn, grey      1
#>  3 auburn, white     1
#>  4 black            13
#>  5 blond             3
#>  6 blonde            1
#>  7 brown            18
#>  8 brown, grey       1
#>  9 grey              1
#> 10 none             37
#> 11 unknown           1
#> 12 white             4
#> 13 <NA>              5
#> 
#> [[2]]
#> # A tibble: 5 × 2
#>   sex             freq
#>   <chr>          <int>
#> 1 female            16
#> 2 hermaphroditic     1
#> 3 male              60
#> 4 none               6
#> 5 <NA>               4
stefan
  • 90,330
  • 6
  • 25
  • 51