0

Please have a close look at the example data sets and desired outcome to see the purpose of this question. It is not a merging data sets solution what I am looking for. So I could find the answer neither here: How to join (merge) data frames (inner, outer, left, right)?, nor here Use apply() to assign value to new column. It rather refers to a solution for assigning values to new colnames if they meet a condition.

Here a reproducible illustration of what I would like to do:

Email <- as.factor(c("1@1.com", "2@2.com", "3@3.com","4@4.com", "5@5.com"))
dataset1 <- data.frame(Email)


Code <- as.factor(c("Z001", "Z002", "Z003","Z004","Z005"))
Email <- as.factor(c("x@x.com", "2@2.com", "y@y.com", "1@1.com","z@z.com"))
dataset2 <- data.frame(Code, Email)

This results in the following example datasets:

  Email
1 1@1.com
2 2@2.com
3 3@3.com
4 4@4.com
5 5@5.com

   Code   Email
1  Z001 x@x.com
2  Z002 2@2.com
3  Z003 y@y.com
4  Z004 1@1.com
5  Z005 z@z.com

Desired output:

    Email      Z002  Z004
  1 1@1.com    NA     1
  2 2@2.com    1      NA
  3 3@3.com    NA     NA 
  4 4@4.com    NA     NA
  5 5@5.com    NA     NA

So I would like to write a loop that checks whether the Email of dataset2 occurs in dataset1, and if this condition is true, that the Code associated with the Email in dataset2, is assigned as a new column name to dataset1 with a 1 as cell value for this observation. My attempt to get this done and an example of the desired output clarifies the question.

My own attempt to fix it (I know it is wrong, but shows my intention):

for(i in 1:nrow(dataset2)){
  if(dataset2$Email[i] %in% dataset1$Email)
    dataset1[,dataset2$Code[i]] <- dataset2$Code[i]
    dataset1[,dataset2$Code[i]][i] <- 1
}

Would be great if anyone could help me out.

Community
  • 1
  • 1
Floris
  • 11
  • 7

1 Answers1

1

Your dataset2 is in "long" format - changing the Code column into multiple columns is changing it to "wide" format. So in addition to the join, we also need to convert from long to wide - this R-FAQ is a good read on that. Combining these two operations, we do this:

dat = merge(dataset1, dataset2, all.x = T)  ## left join
dat$value = 1   ## add the value we want in the result
## convert long to wide
result = reshape2::dcast(dat, Email ~ Code, value.var = "value", drop = T)
result["NA"] = NULL  ## remove the NA column that is added
result
#     Email Z002 Z004
# 1 1@1.com   NA    1
# 2 2@2.com    1   NA
# 3 3@3.com   NA   NA
# 4 4@4.com   NA   NA
# 5 5@5.com   NA   NA
Community
  • 1
  • 1
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you so much! I've been looking for this a long time so your help is much appreciated :) – Floris Feb 20 '17 at 20:22