This is probably a simple question for ppl better at dplyr
- I'd like to compute a frequency list of character data in a dataframe:
Toy data:
df <- data.frame(
id = sample(1:5, 100, replace = TRUE),
v1 = sample(c(NA, rnorm(10)), 100, replace = TRUE),
v2 = sample(LETTERS, 100, replace = TRUE)
)
My attempt so far:
Let's assume the df
first needs to be filtered for a number of variables. Once done with that I am able to compute the frequency list but the output does not show the respective character values so I don't know which value has which frequency:
library(dplyr)
df %>%
filter(!is.na(v1) & !id == lag(id)) %>%
summarise(freq = sort(prop.table(table(v2)), decreasing = TRUE)*100)
freq
1 7.692308
2 6.410256
3 5.128205
4 5.128205
5 5.128205
6 5.128205
7 5.128205
8 5.128205
9 5.128205
10 5.128205
output clipped ...
So what I need to get is a second column showing the value A
, B
, C
etc. that the frequencies belong to. How can that be achieved?
EDIT:
Ooops I think I got it:
df %>%
filter(!is.na(v1) & !id == lag(id)) %>%
summarise(freq = sort(prop.table(table(v2)), decreasing = TRUE)*100,
value = names(sort(prop.table(table(v2)), decreasing = TRUE)))