0

New to R and struggling on how best to visualize my results. I'm also new to stack overflow so I apologize if my formatting is incorrect or distasteful.

Four relationships (or lack there of) I'd like to visualize.

I've made whisker plots, bar graphs, and scatter plots before; but I haven't needed to solve these particular problems before. I've posted my code and the returns under each question.

I've used mean() to compare the averages of the entire sample, as well as the control and experimental group.

The functions used were mean(), lm().

I have reviewed the various graphs in R in the man pages, but I've had trouble figuring out how to apply those to my findings. Any help of guidance is greatly appreciated.

Visualizing no difference in means between control and treatment mean()

#treatment/control variable
> clean$treatment <- NA
> clean$treatment[clean$prime_Page.Submit >= 0] <- 1
> clean$treatment[clean$control_Page.Submit >= 0] <- 0
>
> table(clean$treatment, useNA = "ifany")

0    1 <NA>
23   20  220
>
>
> #1) Did the experimental group have higher attrition
> #subset for difference in progress
>
> #this creates a table
> table(clean$Progress, useNA = "ifany")

2    3    4    7   18   45   66  100 <NA>
3    5    3    1    1    2    1   43  204
>
> #only treated people
> treated <- subset(clean, treatment == 1)
> table(treated$treatment)

1
20
>
> #only control people
> control <- subset(clean, treatment == 0)
> table(control$treatment)

0
23
>
> #run mean testing
> mean(clean$Progress)
[1] NA
> mean(control$Progress)
[1] 100
> mean(treated$Progress)
[1] 100
>
> #they are the same!

Do people self report being in a particular group have a increased attrition rate?" using lm() and lm() after recoding to increase visibility of any effect (none found)

table(clean$Progress, useNA = "ifany")

2    3    4    7   18   45   66  100 <NA>
3    5    3    1    1    2    1   43  204
> clean$dropped_out <- NA
> clean$dropped_out[clean$Progress >= 87] <- 1
> clean$dropped_out[clean$Progress <= 86] <- 0
> table(clean$dropped_out, useNA = "ifany")

0    1 <NA>
16   43  204
> #
>
>
> #Logit predicting dropping out with Q162
> Logit_1 <- lm(dropped_out ~ Q162, data = clean, family = "binomial")
Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
extra argument ‘family’ will be disregarded
> summary(Logit_1)

Call:
lm(formula = dropped_out ~ Q162, data = clean, family = "binomial")

Residuals:
Min       1Q   Median       3Q      Max
-0.93010  0.08111  0.09232  0.09232  0.10352

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.79562    0.33665   2.363   0.0226 *
Q162         0.01121    0.03187   0.352   0.7268
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2877 on 44 degrees of freedom
(217 observations deleted due to missingness)
Multiple R-squared:  0.002801,  Adjusted R-squared:  -0.01986
F-statistic: 0.1236 on 1 and 44 DF,  p-value: 0.7268

>
> #did not have an effect on the drop our rate
>
>
> #Recoding 162 to binary to increase effect if any exists
> table(clean$Q162, useNA = "ifany")

9   10   11   12   13   15 <NA>
10   19    9    3    4    1  217
> clean$m_o <- NA
> clean$m_o[clean$Q162 >= 13] <- 1
> clean$m_o[clean$Q162 <= 11] <- 0
> table(clean$m_o, useNA = "ifany")

0    1 <NA>
38    5  220

> LOgit_2 <- lm(dropped_out ~ m_o, data = clean, family = "binomial")
Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
extra argument ‘family’ will be disregarded
> summary(LOgit_2)

Call:
lm(formula = dropped_out ~ m_o, data = clean, family = "binomial")

Residuals:
Min       1Q   Median       3Q      Max
-0.92105  0.07895  0.07895  0.07895  0.07895

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)       0.92105    0.04211  21.871   <2e-16 ***
m_o  0.07895    0.12350   0.639    0.526
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2596 on 41 degrees of freedom
(220 observations deleted due to missingness)
Multiple R-squared:  0.009868,  Adjusted R-squared:  -0.01428
F-statistic: 0.4086 on 1 and 41 DF,  p-value: 0.5262

OSL test and Welch Two Sample T-test using lm()

#OLS and T.test 
> OLS_Results_1 <- lm(Progress ~ Q162, data = clean)
> summary(OLS_Results_1)

Call:
lm(formula = Progress ~ Q162, data = clean)

Residuals:
Min      1Q  Median      3Q     Max
-76.775   4.573   5.225   5.225   5.876

Coefficients:
Estimate Std. Error t value  Pr(>|t|)
(Intercept)  88.2599    19.9466   4.425 0.0000627 ***
Q162          0.6515     1.8884   0.345     0.732
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.05 on 44 degrees of freedom
(217 observations deleted due to missingness)
Multiple R-squared:  0.002698,  Adjusted R-squared:  -0.01997
F-statistic: 0.119 on 1 and 44 DF,  p-value: 0.7317

>
> t.test(clean$Progress ~ clean$m_o, alternative = c("two.sided"))

Welch Two Sample t-test

data:  clean$Progress by clean$m_o
t = -1.676, df = 37, p-value = 0.1022
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-9.9401469  0.9401469
sample estimates:
mean in group 0 mean in group 1
95.5           100.0
  • Hi, Welcome to SO! You might get a better response if you split up your questions into separate question posts. And with each post include some [sample data](https://stackoverflow.com/a/5963610/5456906) so other users have something to work with. – xilliam Dec 15 '21 at 11:03
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Dec 15 '21 at 11:03

0 Answers0