New to R and struggling on how best to visualize my results. I'm also new to stack overflow so I apologize if my formatting is incorrect or distasteful.
Four relationships (or lack there of) I'd like to visualize.
I've made whisker plots, bar graphs, and scatter plots before; but I haven't needed to solve these particular problems before. I've posted my code and the returns under each question.
I've used mean() to compare the averages of the entire sample, as well as the control and experimental group.
The functions used were mean(), lm().
I have reviewed the various graphs in R in the man pages, but I've had trouble figuring out how to apply those to my findings. Any help of guidance is greatly appreciated.
Visualizing no difference in means between control and treatment mean()
#treatment/control variable
> clean$treatment <- NA
> clean$treatment[clean$prime_Page.Submit >= 0] <- 1
> clean$treatment[clean$control_Page.Submit >= 0] <- 0
>
> table(clean$treatment, useNA = "ifany")
0 1 <NA>
23 20 220
>
>
> #1) Did the experimental group have higher attrition
> #subset for difference in progress
>
> #this creates a table
> table(clean$Progress, useNA = "ifany")
2 3 4 7 18 45 66 100 <NA>
3 5 3 1 1 2 1 43 204
>
> #only treated people
> treated <- subset(clean, treatment == 1)
> table(treated$treatment)
1
20
>
> #only control people
> control <- subset(clean, treatment == 0)
> table(control$treatment)
0
23
>
> #run mean testing
> mean(clean$Progress)
[1] NA
> mean(control$Progress)
[1] 100
> mean(treated$Progress)
[1] 100
>
> #they are the same!
Do people self report being in a particular group have a increased attrition rate?" using lm() and lm() after recoding to increase visibility of any effect (none found)
table(clean$Progress, useNA = "ifany")
2 3 4 7 18 45 66 100 <NA>
3 5 3 1 1 2 1 43 204
> clean$dropped_out <- NA
> clean$dropped_out[clean$Progress >= 87] <- 1
> clean$dropped_out[clean$Progress <= 86] <- 0
> table(clean$dropped_out, useNA = "ifany")
0 1 <NA>
16 43 204
> #
>
>
> #Logit predicting dropping out with Q162
> Logit_1 <- lm(dropped_out ~ Q162, data = clean, family = "binomial")
Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
extra argument ‘family’ will be disregarded
> summary(Logit_1)
Call:
lm(formula = dropped_out ~ Q162, data = clean, family = "binomial")
Residuals:
Min 1Q Median 3Q Max
-0.93010 0.08111 0.09232 0.09232 0.10352
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.79562 0.33665 2.363 0.0226 *
Q162 0.01121 0.03187 0.352 0.7268
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2877 on 44 degrees of freedom
(217 observations deleted due to missingness)
Multiple R-squared: 0.002801, Adjusted R-squared: -0.01986
F-statistic: 0.1236 on 1 and 44 DF, p-value: 0.7268
>
> #did not have an effect on the drop our rate
>
>
> #Recoding 162 to binary to increase effect if any exists
> table(clean$Q162, useNA = "ifany")
9 10 11 12 13 15 <NA>
10 19 9 3 4 1 217
> clean$m_o <- NA
> clean$m_o[clean$Q162 >= 13] <- 1
> clean$m_o[clean$Q162 <= 11] <- 0
> table(clean$m_o, useNA = "ifany")
0 1 <NA>
38 5 220
> LOgit_2 <- lm(dropped_out ~ m_o, data = clean, family = "binomial")
Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
extra argument ‘family’ will be disregarded
> summary(LOgit_2)
Call:
lm(formula = dropped_out ~ m_o, data = clean, family = "binomial")
Residuals:
Min 1Q Median 3Q Max
-0.92105 0.07895 0.07895 0.07895 0.07895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.92105 0.04211 21.871 <2e-16 ***
m_o 0.07895 0.12350 0.639 0.526
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2596 on 41 degrees of freedom
(220 observations deleted due to missingness)
Multiple R-squared: 0.009868, Adjusted R-squared: -0.01428
F-statistic: 0.4086 on 1 and 41 DF, p-value: 0.5262
OSL test and Welch Two Sample T-test using lm()
#OLS and T.test
> OLS_Results_1 <- lm(Progress ~ Q162, data = clean)
> summary(OLS_Results_1)
Call:
lm(formula = Progress ~ Q162, data = clean)
Residuals:
Min 1Q Median 3Q Max
-76.775 4.573 5.225 5.225 5.876
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 88.2599 19.9466 4.425 0.0000627 ***
Q162 0.6515 1.8884 0.345 0.732
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 17.05 on 44 degrees of freedom
(217 observations deleted due to missingness)
Multiple R-squared: 0.002698, Adjusted R-squared: -0.01997
F-statistic: 0.119 on 1 and 44 DF, p-value: 0.7317
>
> t.test(clean$Progress ~ clean$m_o, alternative = c("two.sided"))
Welch Two Sample t-test
data: clean$Progress by clean$m_o
t = -1.676, df = 37, p-value = 0.1022
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-9.9401469 0.9401469
sample estimates:
mean in group 0 mean in group 1
95.5 100.0