0

I have the following data with paired observations in long format. I am trying to do a paired t-test along time variable in R on the long format, but by first detecting obs that are not available in both time 1 and 2 (obs B and E in this case), and then perhaps creating a new dataframe with the observations in order. Is there a way to do this without reshaping the data into wide format first? Help and suggestions would be appreciated, R newbie here.

obs time value
A   1    5.5
B   1    7.1
C   1    4.3
D   1    6.4
E   1    6.6
F   1    5.6
G   1    6.6
A   2    6.5
C   2    6.7
D   2    7.8
F   2    5.7
G   2    8.9   
  • Search for "reshape long to wide" to match up your observations from the two different times on the same row. Maybe start here: https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format – MrFlick Jan 02 '18 at 15:26

2 Answers2

2

As an alternative to the use of duplicated in @CPak's long-format answer you can group by the observation and filter for where the count of the observations is not equal to 1:

library(dplyr)

p = 
  group_by(df, obs) %>%
  filter(n() != 1) %>%
  arrange(time, obs) %>%
  ungroup()

Leads to the same result in any event, as when applying the t-test as shown in @CPak's answer:

ans <- with(p, t.test(value ~ time, paired=TRUE))

> ans

    Paired t-test

data:  value by time
t = -3.3699, df = 4, p-value = 0.02805
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.6264228 -0.2535772
sample estimates:
mean of the differences 
                  -1.44 
Stewart Ross
  • 1,034
  • 1
  • 8
  • 10
0

You can use duplicated both in the forward and reverse fromLast=TRUE direction to filter your data

library(dplyr)
p <- df %>%
       filter(duplicated(obs) | duplicated(obs, fromLast=TRUE)) %>%
       arrange(time, obs)

   # obs time value
# 1    A    1   5.5
# 2    C    1   4.3
# 3    D    1   6.4
# 4    F    1   5.6
# 5    G    1   6.6
# 6    A    2   6.5
# 7    C    2   6.7
# 8    D    2   7.8
# 9    F    2   5.7
# 10   G    2   8.9

Then perform the paired t.test

ans <- with(p, t.test(value ~ time, paired=TRUE))

        # Paired t-test

# data:  value by time
# t = -3.3699, df = 4, p-value = 0.02805
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
 # -2.6264228 -0.2535772
# sample estimates:
# mean of the differences 
                  # -1.44    

Your original data

df <- read.table(text="obs time value
A   1    5.5
B   1    7.1
C   1    4.3
D   1    6.4
E   1    6.6
F   1    5.6
G   1    6.6
A   2    6.5
C   2    6.7
D   2    7.8
F   2    5.7
G   2    8.9", header=TRUE, stringsAsFactors=FALSE)
CPak
  • 13,260
  • 3
  • 30
  • 48
  • This doesn't really address how to perform the paired t-test.Are you making the assumption that the data must be ordered? – MrFlick Jan 02 '18 at 15:49
  • @MrFlick, Perhaps I'm mistaken but I thought OP wanted a way to *filter the dataset for only those that have a paired observation*, not necessarily how to perform a paired t-test. (Note that Roman also only addresses this question, which made me think I'm not off base). I've updated my answer to order the pairs after filtering. Thanks – CPak Jan 02 '18 at 15:52
  • Just seems incomplete given the title. Some additional information would or assumption would be needed to actually perform the test unless i'm missing something. Reshaping would be much safer in my opinion. But that's really for the OP to decide I guess. – MrFlick Jan 02 '18 at 15:54
  • MrFlick, you are right, reshaping to wide would be safer because it ensures that the observations are linked together without worrying if they are in the same order in long format, which facilitates with paired t.test. But I was looking for a method to do this without reshaping to wide, which CPak addressed for me. Thank you both and Roman Luštrik for your help. – Ishtiaq Mawla Jan 02 '18 at 16:02