I'm analyzing questions in a survey where each response has been been assigned into 1 of 3 clusters. Example data is:
library(tidyverse)
Do.You.Live.in.the.USA <- as.factor(c("Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No", "Yes"))
Whats.your.favorite.color <- as.factor(c("Red", "Blue", "Green", "Red", "Blue", "Green", "Red", "Green", "Blue"))
Cluster <- c(1,2,3,3,2,1,3,2,1)
survey_data <- data.frame(Do.You.Live.in.the.USA, Whats.your.favorite.color, Cluster)
survey_data[] <- lapply(survey_data, factor)
The survey responses have been subsetted into three data frames each representing a cluster:
cluster_1_df <- survey_data %>%
filter(Cluster=="1") %>%
select(-Cluster)
cluster_2_df <- survey_data %>%
filter(Cluster=="2") %>%
select(-Cluster)
cluster_3_df <- survey_data %>%
filter(Cluster=="3") %>%
select(-Cluster)
I'd like to create a summary for each cluster and merge those back together into a matrix so I an visualize later. Something like:
cluster_1 <- summary(cluster_1_df$Do.You.Live.in.the.USA)
cluster_2 <- summary(cluster_2_df$Do.You.Live.in.the.USA)
cluster_3 <- summary(cluster_3_df$Do.You.Live.in.the.USA)
US_live_summary <- cbind(cluster_1, cluster_2, cluster_3)
Analyzing many survey questions this will become laborious, hence I'd like to use a function so I can analyze many questions, but I run into a problem:
clust_sum_fun <- function(x){
cbind(summary(cluster_1_df$x), summary(cluster_2_df$x), summary(cluster_3_df$x))
}
US_live_summary <- clust_sum_fun(Do.You.Live.in.the.USA)
... returns blank values.
I suspect it's using a string as a variable in the function. Can anyone suggest a solution please?