I'm editing this question as I think this will substantially explain my issue and attempt at what I'm trying to achieve better. I wrote the following to a friend who recommended I go back to the experts here!
I've combined three data sets into one and then I've cleaned up the names through mutate and gotten myself a nice clean data to work with.
From that data, I've been able to select a subset of columns, group by two fields, and then run a mean / sd across all columns using:
GroupedMeans <- cleanuxq2 %>%
dplyr:: select(starts_with("X"),("list"),("urespid_0") ("segment")) %>%
group_by (list, segment)%>%
summarize(across(
.cols = is.numeric,
.fns = list(Mean = mean, SD = sd),
# .names = "{col}_{fn}"
))
I've been able to go one step further and even create topbox / middlebox / bottom box scores within a function and pass it through to a copy of the code above:
#this function calculates the top/middle/bottom boxes and is then applied across all columns -----
myfunc <- list(
topbox = ~sum(. > 5)/n(),
middlebox = ~sum((. ==3)+(. ==4)+(. ==5))/n(),
bottombox = ~sum(. < 3)/n(),
X_n = ~n())
#this creates the boxes using the function above --------
UXQ_Boxes <- cleanuxq2 %>%
dplyr:: select(starts_with("X"),("list"),("urespid_0"),("segment")) %>%
group_by (segment, list)%>%
summarize(across(
.cols = is.numeric,
.fns = myfunc
# .names = "{col}_{fn}"
))
The next step was to translate the text characters into doubles, so I've done this:
#this translates char to double and move into new frame------
cleanuxq3 <- cleanuxq2 %>%
mutate_at(vars(ends_with("effectiveness")),
~as.double(recode(.,
"Success" = 0,
"Timeout" = 1,
"Abandon" = 2)))%>%
mutate_at(vars(ends_with("pass_fail")),
~as.double(recode(.,
"V"=0,
"D"=1,
"P"=3,
"N"=4)))%>%
mutate_at(vars(ends_with("exp_difficulty")),
~as.double(recode(.,
"Yes"=0,
"No"=1)))
This is where things go crappy for me Here's an example though:
UXQ_Tasks1 <- cleanuxq3 %>%
dplyr:: select(("list"),("urespid_0"),("segment"),starts_with("t")) %>%
group_by (list,segment)%>%
summarize(
seconds = (mean(cleanuxq3$t1_time_task))/60,
UniquePages = (mean(cleanuxq3$t1_unique_pageviews))
# .names = "{col}_{fn}"
)
There are two issues with this, one is it provides inaccurate means when I have both segments in there, it appears to be doing the mean for all the data, not by 'list' and then 'segment'.
The second problem, which if you tell me to just manually do it, I will, is I have repeating columns that increase sequentially; e.g. t1_effectiveness t2_effectiveness t3_effectiveness
What I would ideally like to do is:
- be able to group by 'list'
- subgroup by 'segment' and then 3 if possible subgroup by the field 't1_pass_fail' (note the t1 ask below)
- because I have 12 fields for each 't1' and upto 't10', I'd like to either to a loop so that I can apply the same formula to each variable looping through t1 - t10: if ends_with("effectiveness") sum(.)/n() if ends_with ("satisfied") mean(.)/7 etc I realise 4 is an ask, but I just assume you're far more brilliant than I :)
Here is a version of my datafile