Edit (2021-03-29): tidyverse
Principles
Here is some updated code that utilizes tidyverse
, specifically functions from dplyr
, tibble
, and purrr
. The code is a bit more readable and easier to carry out as well. Example data set is provided.
tibble(
a = rep(c(1:3), 2),
b = factor(rep(c("Jan", "Feb", "Mar"), 2)),
c = factor(rep(LETTERS[1:3], 2))
) ->
dat
dat #print df
# A tibble: 6 x 3
a b c
<int> <fct> <fct>
1 1 Jan A
2 2 Feb B
3 3 Mar C
4 1 Jan A
5 2 Feb B
6 3 Mar C
Get counts and proportions across columns.
library(purrr)
library(dplyr)
library(tibble)
#library(tidyverse) #to load assortment of pkgs
#output tables - I like to use parentheses & specifying my funs
purrr::map(
dat, function(.x) {
count(tibble(x = .x), x) %>%
mutate(pct = (n / sum(n) * 100))
})
#here is the same code but more concise (tidy eval)
purrr::map(dat, ~ count(tibble(x = .x), x) %>%
mutate(pct = (n / sum(n) * 100)))
$a
# A tibble: 6 x 3
x n pct
<int> <int> <dbl>
1 1 1 16.7
2 2 1 16.7
3 3 1 16.7
4 4 1 16.7
5 5 1 16.7
6 6 1 16.7
$b
# A tibble: 3 x 3
x n pct
<fct> <int> <dbl>
1 Feb 2 33.3
2 Jan 2 33.3
3 Mar 2 33.3
$c
# A tibble: 2 x 3
x n pct
<fct> <int> <dbl>
1 A 3 50
2 B 3 50
Old code...
The table()
function returns a "table" object, which is nigh impossible to manipulate using R in my experience. I tend to just write my own function to circumvent this issue. Let's first create a data frame with some categorical variables/features (wide formatted data).
We can use lapply()
in conjunction with the table()
function found in base R to create a list of frequency counts for each feature.
freqList = lapply(select_if(dat, is.factor),
function(x) {
df = data.frame(table(x))
names(df) = c("x", "y")
return(df)
}
)
This approach allows each list object to be easily indexed and further manipulated if necessary, which can be really handy with data frames containing a lot of features. Use print(freqList)
to view all of the frequency tables.