0

When I run the following code without commenting gr.ascent(MMSE, 0.5, verbose=TRUE) I receive this error Error in b1 * x : 'b1' is missing but when I comment that line I receive the following error when testing MMSE with these arguments MMSE(2,1,farmland$farm,farmland$area). Do you know where my problem is lying?

Error in if (abs(t[i]) <= k) { : missing value where TRUE/FALSE needed

Here's my code:

farmland <- read.csv("FarmLandArea.csv")
str(farmland)
fit=lm(farm~land,data=farmland)
mean.squared.residuals <- sum((lm(farm~land,data=farmland)$residuals)^2)/(length(farmland$farm)-2)

#gradient descent

#things I should possibly use: solve(t(X)%*%X, t(X)%*%y)
gr.ascent<- function(df, x0, alpha=0.2, eps=0.001, max.it = 50, verbose = FALSE){
  X1 <- x0
  cond <- TRUE
  iteration <- 0
  if(verbose) cat("X0 =",X1,"\n")
  while(cond){
    iteration <- iteration + 1
    X0 <- X1
    X1 <- X0 + alpha * df(X0)
    cond <- sum((X1 - X0)^2) > eps & iteration < max.it
    if(verbose) cat(paste(sep="","X",iteration," ="), X1, "\n")
  }
  return(X1)
}


k=19000

#rho <- function(t, k=19000){
#  for (i in seq(1,length(t))){
#    if (abs(t[i]) <= k)
#      return(t[i]^2)
#    else 
#      return(2*k*abs(t[i])-k^2)

#  }

#}

#nicer implementation of rho. ifelse works on vector
rho<-function(t,k) ifelse(abs(t)<=k,t^2,(2*k*abs(t))-k^2)
rho.prime <- function(t, k=19000){
  out <- rep(NA, length(t))
  for (i in seq(1,length(t))){
    if (abs(t[i]) <= k)
    { print(2*t[i])
      out[i] <- 2*t[i] 
    }
    else 
    {
      print(2*k*sign(t[i]))
      out[i] <- 2*k*sign(t[i])
    }
  }
  return(out)
}
MMSE <- function(b0, b1, y=farmland$farm, x=farmland$land){
   # Calls rho.prime() here with argument y-b0-b1*x


   #Why should we call rho.prime? in the html page you have used rho!?
  n = length(y)
  total = 0
  for (i in seq(1,n)) {
    #total = total + rho(t,k)*(y[i]-b0-b1*x[i])
    total = total + rho.prime(y-b0-b1*x,k)*(y[i]-b0-b1*x[i])
  }
  return(total/n)
}

gr.ascent(MMSE(1,2), 0.5, verbose=TRUE)

In which FarmLand csv data is like the following:

state,land,farm
Alabama,50744,14062
Alaska,567400,1375
Arizona,113635,40781
Arkansas,52068,21406
California,155959,39688
Colorado,103718,48750
Connecticut,4845,625
Delaware,1954,766
Florida,53927,14453
Georgia,57906,16094
Hawaii,6423,1734
Idaho,82747,17812
Illinois,55584,41719
Indiana,35867,23125
Iowa,55869,48125
Kansas,81815,72188
Kentucky,39728,21875
Louisiana,43562,12578
Maine,30862,2109
Maryland,9774,3203
Massachusetts,7840,812
Michigan,58110,15625
Minnesota,79610,42031
Mississippi,46907,17422
...

Here's the result of dput(farmland):

> dput(farmland)
structure(list(state = structure(1:50, .Label = c("Alabama", 
"Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", 
"Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", 
"Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", 
"Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", 
"Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", 
"New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", 
"Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", 
"South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", 
"Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", 
"Wyoming"), class = "factor"), land = c(50744L, 567400L, 113635L, 
52068L, 155959L, 103718L, 4845L, 1954L, 53927L, 57906L, 6423L, 
82747L, 55584L, 35867L, 55869L, 81815L, 39728L, 43562L, 30862L, 
9774L, 7840L, 58110L, 79610L, 46907L, 68886L, 145552L, 76872L, 
109826L, 8968L, 7417L, 121356L, 47214L, 48711L, 68976L, 40948L, 
68667L, 95997L, 44817L, 1045L, 30109L, 75885L, 41217L, 261797L, 
82144L, 9250L, 39594L, 66544L, 24230L, 54310L, 97105L), farm = c(14062L, 
1375L, 40781L, 21406L, 39688L, 48750L, 625L, 766L, 14453L, 16094L, 
1734L, 17812L, 41719L, 23125L, 48125L, 72188L, 21875L, 12578L, 
2109L, 3203L, 812L, 15625L, 42031L, 17422L, 45469L, 95000L, 71250L, 
9219L, 734L, 1141L, 67500L, 10938L, 13438L, 61875L, 21406L, 55000L, 
25625L, 12109L, 109L, 7656L, 68281L, 17031L, 203750L, 17344L, 
1906L, 12578L, 23125L, 5703L, 23750L, 47188L)), .Names = c("state", 
"land", "farm"), class = "data.frame", row.names = c(NA, -50L
))
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
  • 1
    Your error seems to be in the line specified: `if (abs(t[i]) <= k)`. It appears your comparison is returning `NA` instead of one of `TRUE/FALSE`. You are probably running into something analogous to: `if(NA < 1) "do this"` which gives the same error. – thelatemail Apr 22 '14 at 03:10
  • 1
    But why should NA happen here? Can you explain a little more? – Mona Jalal Apr 22 '14 at 03:16
  • can you paste the result of 'dput(farmland)' in your question? – RockScience Apr 22 '14 at 03:25
  • Your code looks like it is obfuscated... Two advices here: it is optimistic to comment out some line that throws an error and hope that the remainder of the code will work just fine; set `options(error=recover)` to trigger the debugger when an error is found, so you can investigate and understand the problem more easily. – Jealie Apr 22 '14 at 03:34

1 Answers1

2

OK, by the numbers:

  1. In your call to gr.ascent(...) you pass a function, MMSE as the first argument. Inside gr.ascent(...) you refer to this function as df(...).
  2. The function MMSE(...) has 2 arguments, b0 and b1 for which there are no defaults - so these must be specified or there will be an error, but
  3. When you call the function df(...) inside gr.ascent(...), in the line: X1 <- X0 + alpha * df(X0) you pass only 1 argument, which is b0.
  4. So the second argument, b1 is missing, hence the error.

When you call MMSE(...) directly, as in:

MMSE(2,1,farmland$farm,farmland$area)

you pass farmland$area as the 4th argument. But there is no column area in the farmland data frame! So this gets passed as NA, which, when used in

total = total + rho.prime(y-b0-b1*x,k)*(y[i]-b0-b1*x[i])

coerces the t argument to rho.prime(...) to NA, hence the second error.

I can't suggest a solution because I have no clue what you are trying to accomplish here.

EDIT (Response to OP's comment).

Notwithstanding @thelatemail's comment, which I agree with wholeheartedly, your new error is rather obscure.

In your earlier version, you were passing a function, MSEE(...) to gr.ascent(...), and using it incorrectly. This time, you are passing a value to gr.ascent(...), that value being the return value when you call MSEE(1,2). So what happens when you try to treat this value as a function, as in:

X1 <- X0 + alpha * df(X0)

Well, normally this would throw an error telling you that df is not a function. In this case, it's just your bad luck that df is a function. It is the probability density function for the F distribution, which has a required argument df1, among others (type ?df to see the documentation). That's why you are getting the error.

To "fix" this you need to go back to passing the function, as in:

gr.ascent(MSEE,...)

and then use it correctly inside gr.ascent(...), as in:

X1 <- X0 + alpha * df(X0, <some other argument>).
jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • I fixed the error of running MMSE but gr.ascent gives me error: ` Error in df(X0) : argument "df1" is missing, with no default ` any idea why? Can you please take a look at the updated post? – Mona Jalal Apr 22 '14 at 04:45
  • 4
    @MonaJalal - I think you will be well served by taking on Jealie's advice and doing some careful, systematic debugging of your own - line by line. You would probably have little need to ask this question if you gradually stepped through your problem using R's debugging tools. It would also aid in simplifying your code so that when you do hit a wall, you are not pasting 70 odd lines of code and hoping someone else will be able to pick out where it all breaks down. – thelatemail Apr 22 '14 at 04:54