1

I'm analyzing questions in a survey where each response has been been assigned into 1 of 3 clusters. Example data is:

library(tidyverse)

Do.You.Live.in.the.USA <- as.factor(c("Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No", "Yes"))
Whats.your.favorite.color <- as.factor(c("Red", "Blue", "Green", "Red", "Blue", "Green", "Red", "Green", "Blue"))
Cluster <- c(1,2,3,3,2,1,3,2,1)

survey_data <- data.frame(Do.You.Live.in.the.USA, Whats.your.favorite.color, Cluster)
survey_data[] <- lapply(survey_data, factor)

The survey responses have been subsetted into three data frames each representing a cluster:

cluster_1_df <- survey_data %>%
  filter(Cluster=="1") %>% 
  select(-Cluster)
cluster_2_df <- survey_data %>%
  filter(Cluster=="2") %>% 
  select(-Cluster)
cluster_3_df <- survey_data %>%
  filter(Cluster=="3") %>% 
  select(-Cluster)

I'd like to create a summary for each cluster and merge those back together into a matrix so I an visualize later. Something like:

cluster_1  <- summary(cluster_1_df$Do.You.Live.in.the.USA)
cluster_2  <- summary(cluster_2_df$Do.You.Live.in.the.USA)
cluster_3  <- summary(cluster_3_df$Do.You.Live.in.the.USA)
US_live_summary <- cbind(cluster_1, cluster_2, cluster_3)

Analyzing many survey questions this will become laborious, hence I'd like to use a function so I can analyze many questions, but I run into a problem:

clust_sum_fun <- function(x){
  cbind(summary(cluster_1_df$x), summary(cluster_2_df$x),  summary(cluster_3_df$x))
}

US_live_summary <- clust_sum_fun(Do.You.Live.in.the.USA)

... returns blank values.

I suspect it's using a string as a variable in the function. Can anyone suggest a solution please?

nycrefugee
  • 1,629
  • 1
  • 10
  • 23

1 Answers1

2

Here's a direct approach to do that:

clust_sum_fun <- function(x)
  cbind(summary(cluster_1_df[, x]), summary(cluster_2_df[, x]), summary(cluster_3_df[, x]))  
(US_live_summary <- clust_sum_fun("Do.You.Live.in.the.USA"))
#     [,1] [,2] [,3]
# No     1    2    1
# Yes    2    1    2

One issue was that by writing Do.You.Live.in.the.USA you are actually passing not a name but a variable Do.You.Live.in.the.USA (which was indeed defined, hence there was no error). Another issue was using $x, which can be fixed by using [, x] subsetting, where x now is indeed a character.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102