0

I am working with a WHO macro for transforming anthropometric parameters into Z-scores.

For the purpose of the question, calling the who2007 function requires us to give the name of the data frame and then only the name of the variables (columns) just like in ggplot function. The problem with this is, say if the column name is Age entering argument=Age is different from entering argument='Age'. The former returns a double but the latter returns a list. I am assuming it is the difference of doing df$Age vs df['Age'].

If I have a vector of just the column names and I need to iterate over the same code using different columns each time, if I sequentially enter the respective entries of that character vector, the function throws an error since it encounters a list instead of a double internally. How do I circumvent this? One way I can think of is using the column-numbers or using any grep methods to identify the column numbers, but is there another better method?

ADDENDUM

Here is the function source code (a part of it which I think might explain the problem)

who2007 <- function(FileLab="Temp",FilePath="C:\\Documents and Settings",mydf,sex,age,weight,height,oedema=rep("n",dim(mydf)[1]),sw=rep(1,dim(mydf)[1])) {

#############################################################################
###########   Calculating the z-scores for all indicators
#############################################################################

   old <- options(warn=(-1))

   sex.x<-as.character(get(deparse(substitute(mydf)))[,deparse(substitute(sex))])
   age.x<-as.double(get(deparse(substitute(mydf)))[,deparse(substitute(age))])
   weight.x<-as.double(get(deparse(substitute(mydf)))[,deparse(substitute(weight))])
   height.x<-as.double(get(deparse(substitute(mydf)))[,deparse(substitute(height))])
   if(!missing(oedema)) oedema.vec<-as.character(get(deparse(substitute(mydf)))[,deparse(substitute(oedema))]) else oedema.vec<-oedema
   if(!missing(sw)) sw<-as.double(get(deparse(substitute(mydf)))[,deparse(substitute(sw))]) else sw<-as.double(sw)
   sw<-ifelse(is.na(sw),0,sw)

    sex.vec<-NULL
   sex.vec<-ifelse(sex.x!="NA" & (sex.x=="m" | sex.x=="M" | sex.x=="1"),1,ifelse(sex.x!="NA" & (sex.x=="f" | sex.x=="F" | sex.x=="2"),2,NA))
    age.vec<-age.x
    height.vec<-height.x
   oedema.vec<-ifelse(oedema.vec=="n" | oedema.vec=="N","n",ifelse(oedema.vec=="y" | oedema.vec=="Y","y","n"))

   mat<-cbind.data.frame(age.x,as.double(sex.vec),weight.x,height.x,oedema.vec,sw,stringsAsFactors=F)
    names(mat)<-c("age.mo","sex","weight","height","oedema","sw")

    mat$cbmi<-mat$weight/((height.vec/100)^2)
    mat$zhfa<-NULL
    mat$fhfa<-NULL
    mat$zwfa<-NULL
    mat$fwfa<-NULL
    mat$zbfa<-NULL
    mat$fbfa<-NULL

#############################################################################
###########   Calculating the z-scores for all indicators
#############################################################################

cat("Please wait while calculating z-scores...\n") 

### Height-for-age z-score

mat<-calc.zhfa(mat,hfawho2007)

### Weight-for-age z-score

mat<-calc.zwei(mat,wfawho2007)

### BMI-for-age z-score

mat<-calc.zbmi(mat,bfawho2007)


#### Rounding the z-scores to two decimals

            mat$zhfa<-rounde(mat$zhfa,digits=2)
            mat$zwfa<-rounde(mat$zwfa,digits=2)
            mat$zbfa<-rounde(mat$zbfa,digits=2)

#### Flagging z-score values for individual indicators

            mat$fhfa<-ifelse(abs(mat$zhfa) > 6,1,0)
            mat$fwfa<-ifelse(mat$zwfa > 5 | mat$zwfa < (-6),1,0)
            mat$fbfa<-ifelse(abs(mat$zbfa) > 5,1,0)

if(is.na(mat$age.mo) & mat$oedema=="y") {
mat$fhfa<-NA
mat$zwfa<-NA
mat$zbfa<-NA
}

mat<-cbind.data.frame(mydf,mat[,-c(2:6)])

ADDENDUM 2

The script is also intended to be run by ultiple users, where modifying the source code for them might not be possible. Is there a way to not need to modify the function source code?

stochastic13
  • 423
  • 2
  • 15
  • if you're modifying every column, you could use mutate_all; – Russ Hyde Mar 16 '18 at 07:54
  • @RussHyde No, only a small subset of it. – stochastic13 Mar 16 '18 at 08:08
  • 1
    Please provide example of your function `who2007`, it doesn't have to be complete function, simple enough so we can see input arguments, and data subsetting part. I am guessing you could pass colnames as character vector, and inside your function use `lapply` to loop through the columns? – zx8754 Mar 16 '18 at 08:21
  • 1
    @zx8754 I added the function. I did not get the second part of your comment. – stochastic13 Mar 16 '18 at 08:26
  • I guess you can rewrite the function to accept strings to suit your need instead of using a series of `get` & `deparse(substitute())` – Tung Mar 16 '18 at 08:33
  • @Tung I do not know the usage of Deparse and the argument-handling as written in the function. Can you let me know what to change or what the current code does so that I can modify it? – stochastic13 Mar 16 '18 at 08:34
  • https://stackoverflow.com/questions/45176431/extract-name-of-data-frame-in-r-as-character/45176503 – Tung Mar 16 '18 at 08:44

1 Answers1

1

We could test if the input dataframe has required columns, then get rid of "deparse get" step, e.g.:

who2007 <- function(FileLab = "Temp", FilePath = "C:\\Documents and Settings",
                    mydf,
                    oedema = rep("n",dim(mydf)[1]),sw=rep(1,dim(mydf)[1])) {

  if(!all(c("sex", "age", "weight", "height") %in% colnames(mydf))) stop("mydf, must have 'sex', 'age', 'weight', 'height' columns")

  sex.x <- mydf$sex
  age.x <- mydf$age
  # ...
  # some code
  # ...

  #return
  list(sex.x, age.x)
}

Testing:

#example dataframe   
x <- head(mtcars)

# this errors as required columns are missing
who2007(mydf = x)
# Error in who2007(mydf = x) : 
#   mydf, must have 'sex', 'age', 'weight', 'height' columns

# now update columns with required column names, and it works fine:
colnames(x)[1:4] <- c("sex", "age", "weight", "height")
who2007(mydf = x)
# [[1]]
# [1] 21.0 21.0 22.8 21.4 18.7 18.1
# 
# [[2]]
# [1] 6 6 4 6 8 6
zx8754
  • 52,746
  • 12
  • 114
  • 209