0

I have a dataframe containing several participants that performed a task 6 times. E.g. the data looks like this:

    subject blockN
1        133      1
17       133      2
33       133      3
49       133      4
65       133      5
81       133      6
97       134      1
113      134      2
129      134      3
145      134      4
161      134      5
177      134      6
193      135      1
209      135      2
225      135      3
241      135      4
257      135      5
273      135      6
289      136      1
305      136      2

Additionally I have a list (named excludeTrials) of pairs which participants' trial I need to exclude:

[1] 133   5

[[2]]
[1] 135   1

[[3]]
[1] 135   1

[[4]]
[1] 140   1

Now I tried to subset the dataframe based on those values. I wanted to avoid looping over it so I tried to solve it with sapply: df[df$subject %in% sapply(excludeTrials, "[[", 1) & df$blockN %in% sapply(excludeTrials, "[[", 2) ]

and

subset(df, !( (df$subject %in% sapply(excludeTrials, "[[", 1)) & (df$blockN %in% sapply(excludeTrials, "[[", 2)) ) )

The problem is that with this lines it disregards the fact that both values need to be in the same row and the logical operator gives out TRUE for all blocks (1-6) by any participant being in the first element in the list.

Is there a possibility to solve it without a loop?

Edit:

structure(list(subject = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("133", 
"134", "135", "136", "139", "140", "142", "143", "144", "145", 
"146", "148", "149", "150", "151", "152", "153", "154", "155", 
"156", "157", "158", "159", "160", "161", "162", "163", "164", 
"165", "166", "167", "168", "169", "171", "172", "173", "174", 
"175", "176", "177", "178", "179", "180", "181", "182", "183", 
"184", "185", "186", "187", "188", "189", "190", "191", "192", 
"194", "195", "196", "197", "198", "199", "200", "201", "202", 
"203", "204", "205", "206", "207", "208", "209", "211", "212", 
"213", "214", "215", "216", "217", "219", "220", "221", "222", 
"223", "224", "225", "226", "227", "228", "229", "230", "232", 
"233", "234", "235", "237", "238", "239", "240", "241", "242", 
"243", "244", "245", "246", "247", "248", "249", "250", "251", 
"252", "253", "254", "255", "256", "257", "258", "259", "260", 
"261", "262", "263", "264", "265", "266", "267", "268", "269", 
"270", "271", "272", "273", "274", "275", "276", "277", "278", 
"279", "280", "281", "282", "283", "284", "285", "286", "287", 
"288", "289", "290", "292", "293", "294", "295", "296", "297", 
"298", "299", "300", "301", "302", "303", "304", "305", "306", 
"307", "308", "309", "310", "311", "312", "313", "314", "315", 
"316", "317", "318", "319", "320", "321", "322", "323", "324", 
"325", "326", "327", "328", "329", "330", "331", "332", "333", 
"334", "335", "336", "337", "338", "339", "340", "341", "342", 
"343", "344", "345", "346", "347", "348", "349", "350", "351", 
"352", "353", "354", "355", "356", "357", "358", "359", "360"
), class = "factor"), blockN = c(1, 2, 3, 4, 5, 6)), row.names = c(97L, 
113L, 129L, 145L, 161L, 177L), class = "data.frame")

dput(head(excludeTrials))
list(c(133, 5), c(135, 1), c(135, 1), c(140, 1), c(145, 5), c(146, 
2))

carina__u
  • 1
  • 1
  • 1
    Please create your sample data using `dput( mydata )`.. You'll get better answers – Wimpel Jun 09 '20 at 07:44
  • If you convert your `excludeTrials` list to a data frame with the appropriate column names, you can use `dplyr::anti_join(df, excludeTrials)`. See [How to get the complement of a data frame](https://stackoverflow.com/q/28702960/903061) for more ideas - though some of the answers would only work matching a single column, not 2 like you have. – Gregor Thomas Jun 09 '20 at 07:49

1 Answers1

0

Convert excludeTrials into a two column dataframe and then use anti_join from dplyr.

mat <- setNames(do.call(rbind.data.frame, excludeTrials), names(df))
mat$subject <- factor(mat$subject)
dplyr::anti_join(df, mat)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks for your answer. Although I tried both versions and for me they still exclude the whole participant. What could be the reason that it works for you but not for me? – carina__u Jun 09 '20 at 08:54
  • @carina__u Can you show your expected output for the example shared? – Ronak Shah Jun 09 '20 at 08:55
  • The expected output would be what you wrote: # subject blockN #17 133 2 #33 133 3 #49 133 4 #81 133 6 #97 134 1 #113 134 2 #129 134 3 #145 134 4 #161 134 5 #177 134 6 #209 135 2 #225 135 3 #241 135 4 #273 135 6 What I get is: 97 134 1 113 134 2 129 134 3 145 134 4 161 134 5 177 134 6 289 136 1 305 136 2 – carina__u Jun 09 '20 at 09:09
  • In that case can you edit your post to include `dput(df)` and `dput(excludeTrials)` just as I have added in my answer? Maybe you have some different data structure. – Ronak Shah Jun 09 '20 at 09:11
  • what I get from dput() is too long even for my console. What should I do in that case? – carina__u Jun 09 '20 at 09:12
  • In that case, add `dput(head(df))` and `dput(head(excludeTrials))` would be enough to get first 6 rows. – Ronak Shah Jun 09 '20 at 09:13
  • I think the problem might be that subject is a factor. So you were right, my structure looks different: structure(list(subject = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, ...), .Label = c("133", "134", "135", "136", ...) – carina__u Jun 09 '20 at 09:15
  • Please edit your main question too include both the `dput`. – Ronak Shah Jun 09 '20 at 09:16
  • And: dput(head(excludeTrials)) list(c(133, 5), c(135, 1), c(135, 1), c(140, 1), c(145, 5), c(146, 2)) – carina__u Jun 09 '20 at 09:16