0

I am new to R so be easy! I have two datasets in which two different samples (men and women) are asked the same questions (column names are identical). I want to run a t-test comparing the means of any two columns in each dataset but I can't figure out how to merge them into one dataset in a useful way. I have tried a few things like merge and rbind but they are not doing what I would like.

Here is a column in dataset 1. I would like to compare it with...

structure(list(UVRATE1 = c(6, 6, 3, 7, 7, 7, 4, 6, 6, 6, 6, 4, 
7, 4, 1, 5, 6)), class = "data.frame", row.names = c(NA, -17L
))

... this column in dataset 2 (as you can see, same column names.

structure(list(UVRATE2 = c(4, 1, 3, 5, 6, 7, 7, 4, 7, 4, 7, 7, 
4, 4, 5, 1, 4)), class = "data.frame", row.names = c(NA, -17L
))
TarJae
  • 72,363
  • 6
  • 19
  • 66

2 Answers2

4

You can create a data frame and pass it directly into unpaired two-samples t-test using t.test:

dataset1 <- data.frame (UVRATE1 = c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5))
# dataset1$UVRATE1
# [1] 38.9 61.2 73.3 21.8 63.4 64.6 48.4 48.8 48.5

dataset2 <- data.frame (UVRATE1 = c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4))
# dataset2$UVRATE1
# [1] 67.8 60.0 63.4 76.0 89.4 73.3 67.3 61.3 62.4

# Create a merged data frame
my_data <- data.frame( 
  group = rep(c("Woman", "Man"), each = 9),
  weight = c(dataset1$UVRATE1,  dataset2$UVRATE1)
)

# my_data
# group weight
# 1  Woman   38.9
# 2  Woman   61.2
# 3  Woman   73.3
# 4  Woman   21.8
# 5  Woman   63.4
# 6  Woman   64.6
# 7  Woman   48.4
# 8  Woman   48.8
# 9  Woman   48.5
# 10   Man   67.8
# 11   Man   60.0
# 12   Man   63.4
# 13   Man   76.0
# 14   Man   89.4
# 15   Man   73.3
# 16   Man   67.3
# 17   Man   61.3
# 18   Man   62.4

# Compute t-test
res <- t.test(my_data[my_data$group == "Woman",]$weight,my_data[my_data$group == "Man",]$weight, var.equal = TRUE)

# Two Sample t-test
# 
# data:  my_data[my_data$group == "Woman", ]$weight and my_data[my_data$group == "Man", ]$weight
# t = -2.7842, df = 16, p-value = 0.01327
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   -29.748019  -4.029759
# sample estimates:
#   mean of x mean of y 
# 52.10000  68.98889 

Do not forget to check assumptions.

ARAT
  • 884
  • 1
  • 14
  • 35
  • 1
    I am glad. If it provides an answer for you, please check it. – ARAT Feb 27 '21 at 19:09
  • 2
    With your data, it's simpler to use the formula interface, `t.test(weight ~ group, data = my_data, var.equal = TRUE)`. – Rui Barradas Feb 27 '21 at 19:21
  • Why bother to put them into the same dataframe? Why not just `t.test(dataset1$UVRATE1, dataset2$UVRATE1)`? – IRTFM Feb 27 '21 at 19:28
  • 1
    For sure. That is what the OP asked for! – ARAT Feb 27 '21 at 19:30
  • I tried t.test(dataset1$UVRATE1, dataset2$UVRATE1) before i asked the question but i get an error message that looks like this: Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed 1: In mean.default(x) : argument is not numeric or logical: returning NA 2: In mean.default(y) : argument is not numeric or logical: returning NA – Desman Wilson Feb 27 '21 at 19:38
  • Not sure why it would return NA if all of the values are numerical. Maybe I should have looked up the error for this simpler command. I thought I was just inputting a non-valid command. – Desman Wilson Feb 27 '21 at 19:42
  • @DesmanWilson: There was probably an error in the construction of one of the source dataframes. Perhaps you had a factor column that you didn't realize. Look at str(.) done on both dataframes. In the future including data as text and all prior error messages is the best way to understand what sort of errors exist. – IRTFM Feb 27 '21 at 21:15
0

Code:

# dataframe 1
dataset_1 <- data.frame(UVRATE1= c(6, 6, 3, 7, 7, 7, 4, 6, 6, 6, 6, 4, 7, 4, 1, 5, 6)) 
# dataframe 2
dataset_2 <- data.frame(UVRATE1= c(4, 1, 3, 5, 6, 7, 7, 4, 7, 4, 7, 7, 4, 4, 5, 1, 4))

# change name of column in dataset2
colnames(dataset_2)[1] = "UVRATE2"

# combine to one dataframe
df <- cbind(dataset_1, dataset_2)

# t-test
t.test(df$UVRATE1,df$UVRATE2)

Output:

    Welch Two Sample t-test

data:  df$x and df$y
t = 1.0394, df = 31.128, p-value = 0.3066
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.622388  1.916506
sample estimates:
mean of x mean of y 
 5.352941  4.705882 
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    The example of combining two columns of the same length into a single dataframe is not a good example to sue for an independent samples t-test. In general there will be different lengths of the samples. – IRTFM Feb 27 '21 at 19:26
  • 1
    ... one part of the question was : "..but I can't figure out how to merge them (the two datasets) into one dataset ..." so all in all the newbie now knows how to cbind two columns in one dataframe. – TarJae Feb 27 '21 at 19:51
  • You should instead have shown how to combine two independent vectors. Teaching a new user how to "do things" that will later get them in trouble is not a useful service. – IRTFM Feb 27 '21 at 21:09
  • Thank you for your valuable comment. Another part of the question was: "...I want to run a t-test comparing the means of any two columns in each dataset... This accompanies with my experience performing independent t-test out of a dataframe. I agree with you in case of vectors it is not meaningful to combine them to a dataframe to perform a t-test, which as far as I can anticipate is not the case in this context. – TarJae Feb 28 '21 at 07:23