0

I have a data frame in R with 10,000 columns and roughly 4,000 rows. The data are IDs. For example the IDs look like (rs100987, rs1803920, etc). Each rsID# has a corresponding iHS score between 0-3. I have a separate data frame where all the possible rs#'s in existence are in one column and their corresponding iHS scores are in the next column. I want to replace my 10,000 by 4,000 data frame with rsIDs to a 10,000 by 4,000 data frame with the corresponding iHS scores. How do I do this?

This is what my file looks like now:

input ID     match 1    match 2     match 3 ......
rs6708       rs10089   rs100098    rs10567
rs8902       rs18079   rs234058    rs123098
rs9076       rs77890   rs445067    rs105023

This is what my iHS score file looks like (it has matching scores for every ID in the above file

snpID     iHS
rs6708    1.23
rs105023   0.92
rs234058  2.31
rs77890   0.31

I would like my output to look like 

match 1   match 2   match 3
0.89      0.34      2.45
1.18      2.31      0.67
0.31      1.54      0.92
Evan
  • 1,477
  • 1
  • 17
  • 34

1 Answers1

1

Let's consider a small example:

(dat <- data.frame(id1 = c("rs100987", "rs1803920"), id2=c("rs123", "rs456"), stringsAsFactors=FALSE))
#         id1   id2
# 1  rs100987 rs123
# 2 rs1803920 rs456
(dat2 <- data.frame(id=c("rs123", "rs456", "rs100987", "rs1803920", "rs123456"),
                   score=5:1, stringsAsFactors=FALSE))
#          id score
# 1     rs123     5
# 2     rs456     4
# 3  rs100987     3
# 4 rs1803920     2
# 5  rs123456     1

Then you can do this operation with:

apply(dat, 2, function(x) dat2$score[match(x, dat2$id)])
#      id1 id2
# [1,]   3   5
# [2,]   2   4

The call to match figures out the row in dat2 corresponding to each id in your column.

josliber
  • 43,891
  • 12
  • 98
  • 133
  • I edited my original post, does this answer still hold? – Evan Jul 24 '15 at 18:03
  • @Evan yes, my posted code finds the matching scores. It looks like you've edited a completely new question at the bottom, which is not good practice on this website. Please instead use the "Ask Question" button to ask your new question about fitting normal distributions based on the matched data. – josliber Jul 24 '15 at 18:09
  • My bad, I took it off. Can you explain what the number 2 specifies? – Evan Jul 24 '15 at 18:13
  • @Evan it means you fall the specified function on each column of `dat`. To operate on rows instead you would use 1 instead of 2. – josliber Jul 24 '15 at 18:59