1

I have three tables. One which contains the results. The other two contain notes about unique ids that appear in the results. I will try to explain the format a little bit.

RESULTS

UniqueID1 UniqueID2 [Columns containing the results from some experiment.]

NOTES1

UniqueID [Notes about this particular uniqued ID]

NOTES2

same as above.

So, I want every row in the results to contain the notes from the NOTES table. I now accomplish this iteratively; this process now takes longer than the experiment itself. I have considered using subsetting or lapply perhaps, but can't seem to find something fitting.

Some things to consider.

Regarding results: The two UniqueID columns in results are disjoint. They are not sets; they contain repetition. However, each pair of unique IDs occurs exactly one time in the results. table.

Regarding NOTES: The unique id column does not contain repetition.

The details I have given above are part of why I haven't just lazily subsetted the NOTES table and called cbind on results and the subset of NOTES.

I believe I have devoted a decent bit of thought to this; please don't downvote just because the answer is trivial to you. I haven't included code, because I want to know how accomplish this for the general case in R without iteration. If you think that it will help, I could include some code.

Sample Tables:

Results

"UniqueID_1"    "UniqueID_2"    "P_val" "Corr_Coef"
"M200000001"    "M200000113"    "0.530536974203384" "-0.115081242888051"
"M200000001"    "M200000113"    "0.530536974203384" "-0.115081242888051"
"M200000001"    "M200000114"    "0.252990706016934" "0.399292132865147"
"M200000001"    "M200000115"    "0.368094246871692" "0.16456647390621"
"M200000001"    "M200000116"    "0.78457345841227"  "-0.0502948318806314"
"M200000001"    "M200000117"    "0.874022710686625" "-0.0291832590390869"
"M200000001"    "M200000121"    "0.199903265024637" "0.232729988792225"
"M200000001"    "M200000122"    "0.892630363051574" "-0.024845579436259"
"M200000001"    "M200000126"    "0.27862153489629"  "-0.208127935743872"

NOTES

M200000002  22352   Vim
M200000004  20810   Srm
M200000005  17878   Myf6
M200000006  27061   Bcap31
M200000007  54204   1-Sep
M200000008  71995   Erv3
M200000009  15384   Hnrpab
M200000010  14453   Gas2
M200000011  12054   Bcl7b
M200000012  26572   Cops3
M200000014  17896   Myl4
M200000016  52469   Ccdc56
M200000017  19720   Trim27
M200000019  11434   Acr
seancarmody
  • 6,182
  • 2
  • 34
  • 31
order
  • 335
  • 1
  • 7
  • 14
  • 2
    Yes, sample data will help. For one thing, it's not clear to me how UniqueID1 and UniqueID2 in the results match up with UniqueID in the two notes tables; is UniqueID1 for Notes1, for example? It would also help to show using the sample data why subsetting/cbind won't work. – Aaron left Stack Overflow Aug 24 '12 at 20:16
  • 2
    Would `merge` help, or does that run into a problem? – Blue Magister Aug 24 '12 at 21:10
  • @Aaron UniqueID1 and UniqueID2 correspond to distinct NOTES tables. UniqueID1 corresponds to NOTES1 and UniqueID2 corresponds to NOTES2. – order Aug 24 '12 at 21:10
  • @BlueMagister I thought this might head toward database-esque function. Looks promising I will give it a shot and let you know. – order Aug 24 '12 at 21:14
  • It still isn't clear to me why subset and cbind aren't working for you; using `dput` on your data might make it clearer. – Aaron left Stack Overflow Aug 25 '12 at 01:27
  • 1
    You may also find this useful: [merge-multiple-data-frames-in-a-list-simultaneously](http://stackoverflow.com/questions/8091303/merge-multiple-data-frames-in-a-list-simultaneously) – flodel Aug 25 '12 at 02:18
  • What is the name of the notes column in each of your NOTES tables? Also, can a single results row have entries in both NOTES1 and NOTES2? Do you want the notes from NOTES1 and NOTES2 to be in the same column in your new table, or in two separate columns? – seancarmody Aug 25 '12 at 04:25

2 Answers2

3
merge(results, notes, by="UniqueID_1", all.x=TRUE)

or, for the case of two notes:

merge(merge(results, notes1, by="UniqueID_1", all.x=TRUE),
      notes1, by="UniqueID_2", all.x=TRUE)

You haven't told us the names of your NAMES columns, so if the ID column in each is actually UniqueID then you'd need to do something like this:

merge(merge(results, notes1, by.x="UniqueID_1", by.y="UniqueID", all.x=TRUE),
      notes1, by.x="UniqueID_2", by.y="UniqueID", all.x=TRUE)

The notes from notes1 and notes2 will be in different columns.

seancarmody
  • 6,182
  • 2
  • 34
  • 31
  • If you want to get fancy, you can have a look at the `sqldf` package. – seancarmody Aug 25 '12 at 04:19
  • Thanks for the answer. This is what I am looking for; now i Just have to work out implementing. I apologize for not responding sooner. I have been busy coding other sections. When I return to it, I am quite sure this ought to work. – order Aug 27 '12 at 23:03
0

It still isn't clear to me why subset and cbind aren't working for you; it would seem something like this would work fine.

 cbind(Results, Notes1[Results$UniqueID_1,], Notes2[Results$UniqueID_2,])

This would require the rownames of the Notes tables to be the ID; perhaps that's the issue?

 rownames(Notes1) <- Notes1$UniqueID
 rownames(Notes2) <- Notes2$UniqueID

match is also useful in cases like this; it's what merge is using behind the scenes.

Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142