0

Hello I have a user defined function lambda , into which I am passing age (integer), gender (integer) and a file (BMI).. When I call it I get some value, which is wrong. But when I execute each statements in the function I get the right answer. Strange thing is

  1. after I executing single statements from the function and getting the expected answer, if I call the function again, I get the right answer.
  2. Then if change the values when calling the function, it gives the old answer (of step #1). And it never changes until I execute single statements from the function (as in the step #1)

Can someone please help me..

lambda <-function(ag, gnd, ibmi) {

        sel <- subset(ibmi, age==ag & gender == gnd)
        out <- boxcox(lm(sel$bmi~1))
        rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
        return (rn[1] + rn[2])/2
    }

This is how I call the function..

> lambda(14,1,bmi)
[1] -0.4040404

Executing each statements frm the fn

> sel <- subset(bmi, age==14 & gender == 1)
> out <- boxcox(lm(sel$bmi~1))
> rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
> (rn[1] + rn[2])/2
[1] -0.3636364

Again I call the function

> lambda(14,1,bmi)
[1] -0.3636364

Calling function with different value for Age

> lambda(17,1,bmi)
[1] -0.3636364

> dput(head(bmi[,c(2,3,4,5,6,7,8,9)], 50))
structure(list(age = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6), gender = c(2L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), diabetic = structure(c(3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "No", "Yes"
), class = "factor"), height = c(117, 117.5, 123, 133.5, 110.2, 
126, 119, 113.8, 125.8, 115.6, 124.1, 117.8, 128, 117, 121, 116.6, 
111.3, 116, 118, 123, 119.5, 118.3, 121.5, 116.4, 121, 110, 114, 
116, 113.5, 115.5, 111.7, 116.5, 121.6, 129.9, 115.3, 116, 116.5, 
122.2, 123.4, 118, 117, 107.2, 116.5, 126.6, 118.2, 119.6, 141, 
118.7, 150.6, 113), weight = c(21.2, 21, 19.2, 36.7, 18.1, 42.6, 
36.2, 18.6, 25.1, 20.1, 40.2, 29.7, 32.6, 20.4, 24.5, 20.9, 17.3, 
22, 21.4, 23.5, 27.5, 23.3, 33.4, 22.2, 24, 16.9, 18.8, 23, 20.1, 
20.8, 26.4, 21.5, 41.2, 47.1, 31.4, 18.9, 27.1, 33.6, 26.2, 21.3, 
20, 17.6, 22.7, 24.3, 19.9, 32, 45.4, 21.4, 20.9, 21.2), bmi = c(15.49, 
15.21, 12.69, 20.59, 14.9, 26.83, 25.56, 14.36, 15.86, 15.04, 
26.1, 21.4, 19.9, 14.9, 16.73, 15.37, 13.97, 16.35, 15.37, 15.53, 
19.26, 16.65, 22.63, 16.39, 16.39, 13.97, 14.47, 17.09, 15.6, 
15.59, 21.16, 15.84, 27.86, 27.91, 23.62, 14.05, 19.97, 22.5, 
17.21, 15.3, 14.61, 15.32, 16.73, 15.16, 14.24, 22.37, 22.84, 
15.19, 9.22, 16.6), systolic = c(108L, 104L, 93L, 103L, 99L, 
103L, 130L, 105L, 131L, 102L, 104L, 137L, 129L, 84L, 116L, 101L, 
108L, 101L, 92L, 111L, 100L, 112L, 107L, 118L, 112L, 104L, 108L, 
140L, 100L, 78L, 103L, 100L, 119L, 106L, 110L, 116L, 100L, 102L, 
96L, 90L, 122L, 104L, 110L, 96L, 106L, 106L, 146L, 120L, 102L, 
116L), diastolic = c(68L, 69L, 65L, 76L, 85L, 73L, 90L, 80L, 
70L, 80L, 78L, 113L, 109L, 64L, 97L, 76L, 63L, 55L, 64L, 86L, 
80L, 94L, 68L, 90L, 77L, 75L, 68L, 120L, 70L, 57L, 70L, 75L, 
78L, 90L, 70L, 67L, 75L, 70L, 68L, 82L, 80L, 71L, 70L, 62L, 71L, 
78L, 87L, 80L, 79L, 89L)), .Names = c("age", "gender", "diabetic", 
"height", "weight", "bmi", "systolic", "diastolic"), row.names = c(NA, 
50L), class = "data.frame")
> 
arshad
  • 393
  • 1
  • 11
  • perhaps you can try `return((rn[1] + rn[2])/2)`? not sure it will fix your problem but in your code brackets in the `return` line are misplaced – Vincent Bonhomme May 26 '16 at 07:59
  • @Vincent , I had tried that way as well.. but no success.. I have edited the question for more clarity – arshad May 26 '16 at 08:01
  • perhaps `lm(sel~1, bmi)`? could you make your example [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610)? e.g. provide the result of `dput(bmi)`? – Vincent Bonhomme May 26 '16 at 08:05
  • @Vincent, I just edited my question .. added `dput(bmi)` – arshad May 26 '16 at 08:24
  • Do not use `subset` inside a function; its documentation explicitly warns against using it non-interactively. Use `[` for subsetting. Also, use the `data` argument of `lm`. – Roland May 26 '16 at 08:28
  • @arshad: I don't think so, that's `dput(sel$bmi)`, I think we need `dput(bmi)`. And see @Roland's comment: that was my fear – Vincent Bonhomme May 26 '16 at 08:39
  • @Ronald , `sel <- ibmi[ibmi$age== 10 & ibmi$gender == 1,]` `out <- boxcox(lm(bmi~1, data= sel))`.. I made these changes in the function.. But still I facing same issue.. – arshad May 26 '16 at 08:48
  • @Vincent, `bmi` is big file of around 6000+ records.. can I put it here? – arshad May 26 '16 at 08:49
  • mayeb just some of it? – Vincent Bonhomme May 26 '16 at 08:54
  • @Vincent, I just added `head(bmi)` and `summary(bmi)` – arshad May 26 '16 at 09:17
  • it's helpless to reproduce the example. we really need `dput()`. perhaps `dput(head(bmi, 50))`? – Vincent Bonhomme May 26 '16 at 09:22
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/113009/discussion-between-arshad-and-vincent-bonhomme). – arshad May 26 '16 at 09:37

3 Answers3

2

There are several (potential) issues:

  1. subset should not be used inside functions. Use [ for subsetting.

  2. Scoping issue are common if you pass a fitted lm model to functions. Avoid that and use the boxcox formula method.

  3. You had missing parens in your return statement. return is a function in R.

Thus:

lambda <-function(ag, gnd, ibmi) {
  require(MASS)
  sel <- ibmi[ibmi$age==ag & ibmi$gender == gnd,]
  out <- boxcox(bmi~1, data = sel)
  rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
  return((rn[1] + rn[2])/2)
}

lambda(6,2,bmi)
#[1] -0.4040404

sel <- subset(bmi, age==6 & gender == 2)
out <- boxcox(lm(sel$bmi~1))
rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
(rn[1] + rn[2])/2
#[1] -0.4040404
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Side note: `return` is absolutely not needed as last statement in a function, it can be more readable with it but it adds a small overhead when the interpreter looks for the function before doing the usual return call at end of function. – Tensibai May 26 '16 at 13:03
1

I think the issue is when passing the lm class object to the boxplot. Just avoid using lm altogether and it works.

lambda <-function(ag, gnd, ibmi) {
  sel <- ibmi[ibmi$age==ag & ibmi$gender == gnd, ]
  out <- boxcox(bmi~1, data = sel)
  rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
  return ((rn[1] + rn[2])/2)
}

> lambda(6,2,bmi)
[1] -0.4040404

As seen, the lm is actually redundant. Although we can still use lm first, we must define the data both in the lm and boxplot function to make it work properly

lambda <-function(ag, gnd, ibmi) {
  sel <- ibmi[ibmi$age==ag & ibmi$gender == gnd, ]
  out <- boxcox(lm(bmi~1, data = sel), data = sel)
  rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
  return ((rn[1] + rn[2])/2)
}
zyurnaidi
  • 2,143
  • 13
  • 14
0

Given all the comments, and with your example, here is a working function:

library(MASS)
lambda <-function(ag, gnd, ibmi) {
  sel <- ibmi[ibmi$age==ag & ibmi$gender == gnd,]
  out <- boxcox(lm(bmi~1, sel))
  rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
  return(mean(rn))
}

eg:

lambda(6,2,bmi)
Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38