I am looking for the most frequent values (character strings) and its frequency.
The intended results is a dataframe with three columns:
char: the names of the original columns
mode: the most frequent value in each char
freq: the frequency of the modes
When there is a tie in frequencies, I want to put all of the qualified values in one cell, separated by a comma. -- Or is there any better representation?
Questions: I don't know how to deal with a tie.
I have used the table() function to get the frequency tables of each column.
clean <- read.xlsx("test.xlsx", sheet = "clean") %>% as_tibble()
freqtb <- apply(clean, 2, table)
Here is the second table I got in freqtb:
$休12
个 休 天 饿
1 33 2 1
Then I looped through the tables:
freq <- vector()
mode <- vector()
for (tb in freqtb) {
max = max(tb)
name = names(tb)[tb==max]
freq <- append(freq, max)
mode <- append(mode, name)
}
results <- data.frame(char = names(freqtb), freq = freq, mode=mode)
The mode has a greater length than other vectors, and it cannot attached to results. I bet it is due to ties.
How can can get the same length for this "mode" variable?