R, Analysis on observations that do not merge

Question

I have these two tables I subsetted from a larger table:

table1 <- parent_table[parent_table$diag %in% c('a', 'b', 'c'), parent_table$patient_id] 

table2 <- parent_table[parent_table$med %in% c('d', 'e', 'f'), parent_table$patient_id]

I then merge these two tables, on patient_id to see which patients received a medication for the diagnosis.

merge1 <- merge(table1, table2, by="patient_id", all = TRUE)

I would now like do analysis on those who did NOT receive an antibiotic, i.e. those who are included in table1, but not merge1.

Welcome to Stack Overflow! Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Without the reproducible example, I can only guess that `setdiff(table1$patient_id, table2$patient_id)` may do the trick. — Jota, May 06 '15 at 14:37
with the same caveat as Frank: merge(df1, df2, all = T). It is helpful to add df1$label = "label1" and df2$label = "label2". — Henk, May 06 '15 at 14:40
Do you mean patients who are included in `table1` but not in `table2`? You specify not in `merge1`, which doesn't make sense since merging with `all = TRUE` will contain all of the patients in `table1` (and potentially more). — Alex A., May 06 '15 at 16:47

score 4 · Answer 1 · answered May 06 '15 at 14:43

You could use merge1<- merge(table1, table2, by="patient_id", all = TRUE), then subset to get the records where medication is empty. i.e.

nomeds <- subset(merge1, is.na(medication))

Which would then give you a new dataset, nomeds, which you could do whatever analysis you like on.

I'm not sure if this is what you mean - it would be easier to help if you give a reproducible example with a dummy dataset :)

Mark · Answer 2 · 2015-05-06T19:05:17.547

2

If you're willing to go outside of base R, dplyr has a great operator for this:dplyr::anti_join(a,b,by="x1") returns all rows in a that are not in b, so, merge1=anti_join(table1,table2,by="patientID" will give you want you want.

Reproducible example:

 #Table of people and how much they make
x = data.frame(id=c(1,2,3,4,5),Name=c("Bob","Betty","Carl","Catherine","Dilbert"),Salary=c(65000,78000,27000,36000,172000))
> x
  id      Name Salary
1  1       Bob  65000
2  2     Betty  78000
3  3      Carl  27000
4  4 Catherine  36000
5  5   Dilbert 172000    

 #Table of Women
y = data.frame(id=c(2,4),Name=c("Betty","Catherine"),Gender="Female"
> y
  id      Name Gender
1  2     Betty Female
2  4 Catherine Female

 #Table of Men
> anti_join(x,y,by="id")
  id    Name Salary
1  3    Carl  27000
2  1     Bob  65000
3  5 Dilbert 172000

edited May 06 '15 at 19:05

answered May 06 '15 at 15:23

Mark

4,387
2
28
48

Did you mean `merge1 <- dplyr::anti_join(table1, table2, by = "patient_id")`? Right now you're merging `table1` with `merge1`. – Alex A. May 06 '15 at 16:27
@AlexA. The OP said he has table1, table2, merges them and then wants the rows from table1 that aren't in the merged table. That would be as I've typed it. If the real goal is to get rows in table1 that aren't in table2, then your code is better. I'll edit the answer. – Mark May 06 '15 at 16:32
1

The thing is, merging with `all = TRUE` still gives you the same set or a superset of patients in `table1`. So to get patients who did not receive a particular medication, you'd want to anti-join such that the patients in `table2` are excluded from `table1`, as you've done in your edit. `merge1` in the OP's last sentence may have been a typo because otherwise it makes no sense. – Alex A. May 06 '15 at 16:41

score 0 · Answer 3 · answered May 06 '15 at 15:11

Since you're simply subsetting table1 to patients not present in table2, you can do a simple subset operation rather than merging:

nomeds <- table1[!(table1$patient_id %in% table2$patient_id), ]

For each row of table1, this checks whether patient_id is in the list of patient_ids from table2. If not, the row is kept, otherwise it's excluded.

vagabond · Answer 4 · 2015-05-06T16:49:59.363

0

if you merge and use the arg all = TRUE, for every patient_id that was diagnosed but did not receive med - you will get an NA value. That should answer your problem. I have edited your post with the code.

parent_table <- data.frame(patient_id = c(1,2,3,4,5,6, 7, 8), 
                           diag = c("a", "b", "d", "d", "e", "c", "f", "e"), 
                           med = c("c", "d", "a", "e", "d", "f", "a", "b"))

table1 <- parent_table[parent_table$diag %in% c('a', 'b', 'c'), c("patient_id", "diag")]
table2 <- parent_table[parent_table$med %in% c('d', 'e', 'f'), c("patient_id", "med")]

merge1 <- merge(table1, table2, by="patient_id", all = TRUE)

> merge1
  patient_id diag  med
1          1    a <NA>
2          2    b    d
3          4 <NA>    e
4          5 <NA>    d
5          6    c    f

edited May 06 '15 at 16:49

answered May 06 '15 at 15:29

vagabond

3,526
5
43
76

This doesn't answer the question at all, especially after you edited the code to only select the patient ID from each table. Then if you merge with `all = TRUE` you'll have no `NA`s for anything because you're only dealing with a single column. Also your edit really shouldn't have been approved--it makes no sense as an edit. Code should not be "corrected" in others' posts. Instead, post a comment suggesting code to ensure it matches the OP's intent. – Alex A. May 06 '15 at 16:23
let me re-check that. – vagabond May 06 '15 at 16:25
OK, will adding the diag column and med column when subsetting from the parent table and then merging on patient id solve it? I think i selected only the patient id column. – vagabond May 06 '15 at 16:28
Yes, but then this answer would be identical to @Froom2's, which was posted earlier. But again, you should _not_ edit the OP's code. The OP may or may not be selecting any particular set of columns. – Alex A. May 06 '15 at 16:29
The OP's code was not code at all . It was indented but not working code. Therefore I corrected to make the problem understandable. I didn't see @froom2's code before posting which is why i just mentioned adding all=TRUE. It would be nice if you can add the diag and med column. – vagabond May 06 '15 at 16:33

R, Analysis on observations that do not merge

4 Answers4