0

I have a script I have to run periodically where I use a varying number of assignment statements in R, like this:

r5$NWord_1<-ifelse(r5$Match==1,NA,r5$NWord_1)
r5$NWord_2<-ifelse(r5$Match==2,NA,r5$NWord_2)
r5$NWord_3<-ifelse(r5$Match==3,NA,r5$NWord_3)
r5$NWord_4<-ifelse(r5$Match==4,NA,r5$NWord_4)
r5$NWord_5<-ifelse(r5$Match==5,NA,r5$NWord_5)
r5$NWord_6<-ifelse(r5$Match==6,NA,r5$NWord_6)

r5$NWord_7<-ifelse(r5$Match==7,NA,r5$NWord_7)

The problem is that the number of "NWord" variables changes from run to run usually between 5 and 7). I have the number of "NWord" variables stored separately as Size.

Size<-5

I have tried the following, but get() only works on objects, not columns of dataframes.

for(i in 1:Size){
    get(paste("r5$NWord_",i,sep=""))<-ifelse(r5$Match==i,NA,get(paste("r5$NWord_",i,sep="")))
}

I am curious: What is the best way to automate this process so I do not have to manually run a subset of these statements every time?

Dan
  • 165
  • 5
  • 18
  • Typically, you should change your data to long format (three columns: NWord, Value, Match). There are several packages facilitating this, e.g., the reshape2 package with its `melt` function. Then you need exactly one assignment only. – Roland Nov 01 '18 at 11:11
  • 1
    If the number of 'NWord's changes from run to run, that suggests that NWord should be the variable (rather than your hard-coded `NWord_1`, `NWord_2` ...). Is there a reason why you are using `NWord_1` instead of `NWord[1]`? – Russ Hyde Nov 01 '18 at 11:19
  • Ultimately, I need a data set that is one row per analysis unit (in other words, wide format). So you all are suggesting the best way to do this is to convert the data to long format, use 1 ifelse() statement, and then convert the data back to wide format? That seems like a significant detour. Allow me to revise my question: Is there a better way to do this keeping the data in the preferred wide format? – Dan Nov 01 '18 at 11:34
  • See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – IceCreamToucan Nov 01 '18 at 12:54
  • 1
    The detour is more efficient than any alternative. Data analysis should usually be conducted with long format (google "tidy data") and R supports that far better than analysis of wide format data. – Roland Nov 01 '18 at 15:40
  • However, if you want to continue with your approach, forget `get` and use `[` instead of `$`. See also, `help("$")`. – Roland Nov 01 '18 at 15:43

1 Answers1

0

For those interested: 1) The import data are in wide format (that is what the system allows me to download). 2) The export data have to be in wide format in order to upload to the system. Since these are just a few of some ~200 variables in the data set, going back and forth between wide and long and then long back to wide (possibly multiple times) seems cumbersome and prone to error. Therefore, I came up with this:

idx1<-which(colnames(r5)=="NWord_1")
idx2<-which(colnames(r5)==paste("NWord_",Size,sep=""))
for(i in idx1:idx2){
    r5[,i]<-ifelse(r5$Match==1,NA,r5[,i])
}

Seems to work just fine, however, I'm not sure that it is the most efficient way to code this.

Dan
  • 165
  • 5
  • 18