Assign values to matrix based on condition

Question

I want to create a matrix with 3 columns and many rows assigning 1 or 0 if the condition is satisfied.

I have data stored in 3 variables

 df1 <- data.frame(names=c("A","B","C","D","E","F"))
 df2 <- data.frame(names=c("A","B","C","F"))
 df3 <- data.frame(names=c("E","F","H"))

output will be

     df1  df2 df3
  A  1    1   0
  B  1    1   0 
  C  1    1   0
  D  1    0   0
  E  1    1   1
  F  1    0   1
  H  0    0   1

In first row if A is present in dataset then I will assign 1 under each column and 0 if A not present in dataset

Here is what I have tried

 DF <- rbind(df1,df2,df3)
 for (i in DF) { 
     for (j in 1:length(df1$names)) {
              if(i == df1$names[j]){
                    A3 <-data.frame(paste0("",i),paste0(1),paste0(0),paste0(0))
                    names(A3) <- NULL 
              }
              else{
                    A3 <-data.frame(paste0("",i),paste0(0),paste0(0),paste0(0))

              }
  }
}

I have written this code only for df1 but its very slow because I have more than 1500 rows in my orignal data set. What would be the fastest way to do it?

score 3 · Accepted Answer · answered Sep 08 '15 at 09:39

3

Add a grouping variable to each dataframe:

df1 <- data.frame(names=c("A","B","C","D","E","F"),group="df1")
df2 <- data.frame(names=c("A","B","C","F"),group="df2")
df3 <- data.frame(names=c("E","F","H"),group="df3")
DF <- rbind(df1,df2,df3)

Then do this:

   res <- table(DF)

> res
     group
names df1 df2 df3
    A   1   1   0
    B   1   1   0
    C   1   1   0
    D   1   0   0
    E   1   0   1
    F   1   1   1
    H   0   0   1

Or if you want a dataframe:

library(reshape2)
dcast(names~group, data=DF,fun.aggregate = length)

answered Sep 08 '15 at 09:39

Heroka

12,889
1
28
38

This works for the exampledata you've given, I don't now if there are multiple instances of the same string in your data? – Heroka Sep 08 '15 at 10:17
Thank you so much Heroka. I don't have duplicates in my data – hash Sep 08 '15 at 10:43

Jaap · Answer 2 · 2015-09-08T15:52:34.470

When using the idcol parameter in rbindlist of the data.table package, there is no need of creating a grouping column for each dataframe separately:

library(data.table) # I used v1.9.5 for this
DT <- rbindlist(list(df1, df2, df3), idcol="id")
dcast(DT[, .N , by=.(id,names)], names ~ id, fill=0)

which gives:

   names 1 2 3
1:     A 1 1 0
2:     B 1 1 0
3:     C 1 1 0
4:     D 1 0 0
5:     E 1 0 1
6:     F 1 1 1
7:     H 0 0 1

score 0 · Answer 3 · answered Sep 08 '15 at 09:44

%in% operator lets you check if a string is present in a vector of strings. It is also vectorised, so it works quite quick:

x=c(LETTERS[c(1:6,8)])
df=data.frame(x=x,df1=as.numeric(x %in% df1$names),
          df2=as.numeric(x %in% df2$names),
          df3=as.numeric(x %in% df3$names))
df

If speed is crucial, {data.table} package gives a little speed boost with %chin% operator:

library(data.table)
x=c(LETTERS[c(1:6,8)])
dt=data.table(x=x,df1=as.numeric(x %chin% as.character(df1$names)),
          df2=as.numeric(x %chin% as.character(df2$names)),
          df3=as.numeric(x %chin% as.character(df3$names)))
dt

score 0 · Answer 4 · edited May 23 '17 at 12:29

The code below is slightly more general than the other answers. Also, I think it's useful to know how to dynamically create commands... I use the data frames as you prepared them:

df1 <- data.frame( names = c( "A", "B", "C", "D", "E", "F") )
df2 <- data.frame( names = c( "A", "B", "C"," F") )
df3 <- data.frame( names = c( "E", "F", "H") )

DF <- rbind( df1, df2, df3 )
nDF <- unique( DF ) #we don't want to duplicate tests.

Then the main loop is just like this:

n_ <- 3
for( ii in 1 : n_){
nDF[ paste0( "df", ii ) ] <- as.logical( NA ) #dynamically creates a new variable in your data frame

cmnd <-  paste0("nDF$names %in% df",ii,"$names") #dynamically creates the appropriate command (in this case you want to test e.g. whether "nDF$names %in% df1$names".

nDF[ paste0("df",ii)]  <- eval( parse( text = cmnd ) ) #evaluates the dynamically created command and saves it into the previously created variable.
}

Should be relatively fast. But if you don't have duplicates in your data, then heroka's suggestion to this questions is probably the way to go.

Assign values to matrix based on condition

4 Answers4