0

I have two data frames that I would like to locate matching elements on. Then I want to assign a new value to one of the data frames based on the matching elements. The design data frame looks like this:

   ss clusters  ICC items CondNum
1 300       10 0.05    10       1
2 300       10 0.05    20       2
3 300       10 0.05    50       3
4 300       10 0.05    70       4
5 300       10 0.10    10       5

...
235 5000      150 0.3    50     235
236 5000      150 0.3    70     236
237 5000      150 0.4    10     237
238 5000      150 0.4    20     238
239 5000      150 0.4    50     239
240 5000      150 0.40   70     240

It has 240 rows and 5 columns. The conditions data frame has 60,000 rows and the same first 4 columns as the design data frame. Each row of the conditions df matches a row in the design df (excluding the final column). I want to assign the CondNum from the design df to the matching rows in the conditions data frame. So for example, the conditions data looks like this:

    ss clusters   ICC items
1 1000       10 0.053    10
2 1000       10 0.053    10
3 1000       10 0.053    10
4 300        10  0.10    20
...
51,998 5000      100 0.4    20     
51,999 5000      100 0.4    20    

I want to add a column to the conditions df and assign the CondNum value from the design df to the matching element in the conditions df. I haven't been able to find an answer to this specific issue. So the final result should look like:

>condition
    ss clusters   ICC items CondNum
1 1000       10 0.053    10     108
2 1000       10 0.053    10     108
3 1000       10 0.053    10     108
4 300        10  0.10    20       2
...
51,998 5000      100 0.4    20   210 
51,999 5000      100 0.4    20   210 

Any help is appreciated.

Kate N
  • 423
  • 3
  • 14
  • 1
    Sounds like you need to do a join using `merge()` - [check out this link](http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – ila Jul 06 '15 at 16:28

1 Answers1

1

You could use left_join from dplyr:

> design
#   ss clusters  ICC items CondNum
#1 300       10 0.05    10       1
#2 300       10 0.05    20       2
#3 300       10 0.05    50       3
#4 300       10 0.05    70       4
#5 300       10 0.10    10       5

> condition
#    ss clusters  ICC items
#1  300       10 0.05    20
#2  300       10 0.05    50
#3 1000       10 0.05    70
#4  300       10 0.10    10

> dplyr::left_join(condition, design)
#Joining by: c("ss", "clusters", "ICC", "items")
#    ss clusters  ICC items CondNum
#1  300       10 0.05    20       2
#2  300       10 0.05    50       3
#3 1000       10 0.05    70      NA
#4  300       10 0.10    10       5

Or as per mentioned in the comments, you could use merge() from base R:

> merge(condition, design, all.x = TRUE)
#    ss clusters  ICC items CondNum
#1  300       10 0.05    20       2
#2  300       10 0.05    50       3
#3  300       10 0.10    10       5
#4 1000       10 0.05    70      NA

Note: I modified the datasets for the purpose of the example


Data

## design
design <- structure(list(ss = c(300L, 300L, 300L, 300L, 300L), clusters = c(10L, 
10L, 10L, 10L, 10L), ICC = c(0.05, 0.05, 0.05, 0.05, 0.1), items = c(10L, 
20L, 50L, 70L, 10L), CondNum = 1:5), .Names = c("ss", "clusters", 
"ICC", "items", "CondNum"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

## condition
condition <- structure(list(ss = c(300L, 300L, 1000L, 300L), clusters = c(10L, 
10L, 10L, 10L), ICC = c(0.05, 0.05, 0.05, 0.1), items = c(20L, 
50L, 70L, 10L)), .Names = c("ss", "clusters", "ICC", "items"), 
class = "data.frame", row.names = c("1", "2", "3", "4"))
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77