0

Good day, I am looking for some help in processing my dataset. I have 14000 rows and 500 columns and I am trying to get the maximum value of the first derivative for individual rows in different column groups. I have my data saved as a data frame with the first column being the name of a variable. My data looks like this:

 Species   Spec400   Spec405   Spec410   Spec415
1  AfricanOilPalm_1_Lf_1 0.2400900 0.2318345 0.2329633 0.2432734
2 AfricanOilPalm_1_Lf_10 0.1783162 0.1808581 0.1844433 0.1960315
3 AfricanOilPalm_1_Lf_11 0.1699646 0.1722618 0.1615062 0.1766804
4 AfricanOilPalm_1_Lf_12 0.1685733 0.1743336 0.1669799 0.1818896
5 AfricanOilPalm_1_Lf_13 0.1747400 0.1772355 0.1735916 0.1800227

For each of the variables in the species column, I want to get the maximum derivative from Spec495 to Spec500 for example. This is what I did before I ran into errors.

x<-c(495,500,505,510,515,520,525,530,535,540,545,550)##get x values of     reflectance(Spec495 to Spec500)

y.data.f<-hsp[,21:32]##get row values for the required columns

y<-as.numeric(y.data.f[1,])##convert to a vector, for just the first row of data

library(pspline) ##Using a spline so a derivative maybe calculated from a list of   numeric values

I really wanted to avoid using a loop because of the time it takes, but this is the only way I know of thus far

for(j in 1:14900)
+ { y<-as.numeric(y.data.f[j,]) + a1d<-max(predict(sm.spline(x, y), x, 1))
+     write.table(a1d, file = "a1-d-appended.csv", sep = ",", 
+ col.names = FALSE,   append=TRUE) + }

This loop runs up until the 7861th value then get this error:

Error in smooth.Pspline(x = ux, y = tmp[, 1], w = tmp[, 2], method = method,  : 
NA/NaN/Inf in foreign function call (arg 6)

I am sure there must be a way to avoid using a loop, maybe using the plyr package, but I can't figure out how to do so, nor which package would be best to get the value for maximum derivative.

Can anyone offer some insight or suggestions? Thanks in advance

user2507608
  • 355
  • 1
  • 6
  • 18
  • This post can help http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega – dickoa Jul 11 '13 at 23:50
  • 1
    So what's the 7861th value that ends up in this error? Did you try running just that separately with this function? Why do you think this is a problem with the loop? – Arun Jul 11 '13 at 23:52
  • You have a double assignment in the first line of that loop. Intended? Furthermore, you said you wanted the maximum but it looks like your y value will have as many elements as there are rows in `y.data.f[j,]` – IRTFM Jul 12 '13 at 01:11
  • @Arun: the 7861th row values are: ` Spec495 Spec500 Spec505 Spec510 Spec515 Spec520 Spec525 7861 0.2617789 0.2661565 0.27277 0.2873747 0.3093497 0.3368941 0.3611916 Spec530 Spec535 Spec540 Spec545 Spec550 7861 0.3771284 0.3845924 0.3885089 0.3913611 0.3918995` – user2507608 Jul 12 '13 at 01:34
  • @Dwin: Not sure I intended for the double assignment. I basically wanted the loop to run for each row, so I can get the maximum derivative per each row. So yes effectively I am looking for a new column with 14901 maximum derivative values – user2507608 Jul 12 '13 at 01:39
  • When I run sm.spline on those values I get: `max(predict(sm.spline(vec))$ysmth/5) [1] 0.0783802` Maybe the problem lies just above or below? – IRTFM Jul 12 '13 at 01:56
  • Also from Details: "Note that the argument values must be strictly increasing, a condition that is not required by smooth.spline." – IRTFM Jul 12 '13 at 02:21

1 Answers1

2

First differences are the numerical analog of first derivatives when the x-dimension is evenly spaced. So something along the lines of:

 which.max( diff ( predict(sm.spline(x, y))$ysmth) ) )

... will return the location of the maximum (positive) slope of the smoothed spline. If you wanted the maximal slope allowing it to be either negative or postive you would use abs() around the predict()$ysmth. If you are having difficulties with non-finite values then using an index of is.finite will clear both Inf and NaN difficulties:

predy <- predict(sm.spline(x, y))$ysmth
predx <- predict(sm.spline(x, y))$x
is.na( predy ) <- !is.finite(pred)
plot(predx, predy,  # NA values will not blow up R plotting function,
                   # ...  just create discontinuities.
                  main ="First Derivative")
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks much. Using your command above, I get a value of 6 on my data for the first row. However the original answer I got (when I used my command) is `0.006666018` – user2507608 Jul 12 '13 at 01:43
  • Unless you post the first row values with dput(y.data.f[1,]) there is no way we can help. Furthermore, the `diff()` approach does not calculate a denominator and you still have not clarified whether you want absolute values. – IRTFM Jul 12 '13 at 01:44
  • I am very sorry. The first row of values are: ` [1] 0.2440790 0.2505443 0.2606664 0.2775037 0.2983790 0.3292848 0.3609252 [8] 0.3838254 0.3952052 0.4002841 0.4049134 0.4065415 `. Absolute values are not needed. – user2507608 Jul 12 '13 at 02:05
  • > max(diff(predict(sm.spline(seq_along(vec)*5 ,vec))$ysmth)/5) [1] 0.006328027 – IRTFM Jul 12 '13 at 02:16
  • Thanks DWin, I am not sure by you are multiplying/dividing by 5 though, when there are 11 elements. Would it be possible to explain? Thanks. I am deeply appreciative though – user2507608 Jul 12 '13 at 02:39
  • I was attempting to convert to the units in the x-dimension. The diff() result will be y-change per 5 unit increase in x. The diff()/5 result will be the y-change per single unit increase in x. – IRTFM Jul 12 '13 at 19:21