I have a data frame DF which contains numerous variables. Each variable is present twice because I am conducting an analysis of "couples".
Among others, DF has a series of indicators of diversity :
DF$div1.1, DF$div2.1, .... , DF$divN.1, DF$div.1.2, ..., DF$divN.2
Similarly, it has a series of indicators of another characteristic:
DF$char1.1, DF$char2.1, .... , DF$charM.1, DF$char.1.2, ..., DF$charM.2
Here's a link to an example of DF: http://shorttext.com/5d90dd64
Each time the ".1", ".2" stand for the couple member considered.
My goal:
For each indicator divI and charJ, I want to create another variable DF$divchar
that takes the value DF$divI.1
when DF$charJ.1
>DF$charJ.2
; and DF$divI.2
when DF$charJ.1
<DF$charJ.2
.
Here is the solution I came up with, it seems somehow very intricate and sometimes behaves in strange ways:
I created a series of binary variables that take the value one if
DF$charJ.1
>DF$charJ.2
. The are stored underDF$CharMax.1
. Here's how I created it:DF$CharMax.1 <- as.data.frame( sapply(1:length(nam), function(n) as.numeric(DF[names(DF)==names.1[n]] >DF[names(DF)==names.2[n]]) ))
I created the function
BinaryExtract
:BinaryExtract <- function(var1, var2, extract) {var1*extract +var2*(1-extract)}
I created the matrix
NameFull
that contains all the possible combinations ofdiv
andchar
, separated with"YY"
NameFull <- sapply(c("div1",...,"divN") , function(nam) paste(nam, names(DF$YMax.1), sep="YY")
And then I create all my variables:
DF[, as.vector(NameFull)] <- lapply(as.vector(NameFull), function(e) BinaryExtract(DF[,paste0(unlist(strsplit(e,"YY"))[1],".1")] , DF[, paste0(unlist(strsplit(e,"YY"))[1],".1")] , DF$charMax.1[unlist(strsplit(e,"YY"))[2]]))
My Problem
A. It looks like a very complicated solution for something that simple. What am I missing?
B. Moreover, when I print DF, just typing DF
in the command window, I do not see the variables NameFull
. They seem to appear with the names of char
.
Here's what I get: http://shorttext.com/5d9102c
Similarly, I have tried to change all their names to get rid of the "YY" and it does not seem to work:
names(DF[, as.vector(NameFull)]) <- as.vector(c("div1",...,"divN"), sapply(, function(nam)
paste(nam, names(DF$YMax.1), sep=".")))
When I look at names(DF)
, I keep getting the old names with the "YY"
However, I do get a result if I explicitly call for them
> DF[,"divIYYcharJ"]
I would really appreciate any suggestion, comment and explanation. I am quite new to R ad was more used to Stata. I feel there is something deeply inefficient here. Thanks