-1

I'm a new R user I'm more used to SAS.

I have a dataset so:

 test| tester |obs1| obs2
  test1|1     | Y   |Y
  test1|2     |Y    |N
  test2 |1    |Y    |Y
  test2 |2    | Y   |Y
  test3 |2     |N   |N

I want to 1) only keep tests where both testers took a test and 2) transform the data so it looks like this:

 test|  obs1_tester1 | obs1_tester_2| obs2_tester1 | obs2_tester2
 test1|    Y          | Y            | Y            | N
 test2|   Y           | Y            |Y             |Y
 test4.....

I know that in SAS I could make a variable that counted the number of times each test was taken then delete any observation where the maximum number of times the test was taken was less than 2. Then use the transpose function to easily transpose the data set.

I don't really understand how to do this in R. Can someone point me in the right direction?

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Jacob Ian
  • 669
  • 3
  • 9
  • 17
  • Is it based on the 'Y' in both 'obs1' and 'obs2'? For the second row, 'obs2' is 'N' for test1. – akrun Sep 11 '16 at 16:09
  • Y is just Yes or No so the transpose is based on the tester. so each test has 1 row and the difference between testers is in the column names – Jacob Ian Sep 11 '16 at 16:16

1 Answers1

0

We can gather the data to 'long' format, after grouping by 'test', 'tester', filter only the groups that have any "Y" in the 'Val' column, ungroup, mutate the 'tester' by pasteing the prefix 'tester' (if needed), unite the columns ('Var', 'tester') to a single column ('varTest') and spread to 'wide' format.

library(tidyr)
library(dplyr)
gather(df1, Var, Val, obs1:obs2) %>%
        group_by(test, tester) %>% 
        filter(any(Val=="Y")) %>% 
        ungroup() %>%
        mutate(tester = paste0("tester", tester)) %>% 
        unite(VarTest, Var, tester) %>% 
        spread(VarTest, Val)
#   test obs1_tester1 obs1_tester2 obs2_tester1 obs2_tester2
#  <chr>        <chr>        <chr>        <chr>        <chr>
#1 test1            Y            Y            Y            N
#2 test2            Y            Y            Y            Y

data

df1 <- structure(list(test = c("test1", "test1", "test2", "test2", "test3"
), tester = c(1L, 2L, 1L, 2L, 2L), obs1 = c("Y", "Y", "Y", "Y", 
"N"), obs2 = c("Y", "N", "Y", "Y", "N")), .Names = c("test", 
"tester", "obs1", "obs2"), class = "data.frame", row.names = c(NA, 
-5L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I'm confused why we need to filter. I will need to look up mutate. So Var and Val are what you are naming the columns? – Jacob Ian Sep 11 '16 at 16:21
  • @Jacoblan Yes, the 'Var' and 'Val' are the `key/val' columns you get by reshaping the measure columns 'obs1' and 'obs2'. The `filter` is to remove those 'test' that are not showed in the output. – akrun Sep 11 '16 at 16:23
  • Your output under your code is what I want for now. When I use this code on my data though I get an error: Error in eval(substitute(expr), envir, enclos) : object 'Val' not found so I thinnk I don't understand what you mean with Var/Val – Jacob Ian Sep 11 '16 at 16:30
  • @JacobIan If you check the `?gather` documentation the usage is `gather(data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE)`. Here, `key = Var` and `value = Val`, the `...` are the columns obs1, obs2. I am using `tidyr_0.6.0` – akrun Sep 11 '16 at 16:34
  • 1
    OK I think I understand. I think I must have something wrong with another step of this it isn't recognizing the variable that is tester even though it prints it when I use the head function. I'll have to play with it. All these functions are very new to me. Thanks – Jacob Ian Sep 11 '16 at 16:43