3

I get an error message when I attempt to use apply() conditional on a column of dates to return a set of coefficients.

I have a dataset (herein modified for simplicity, but reproducible):

ADataset <- data.table(Epoch = c("2007-11-15", "2007-11-16", "2007-11-17", 
                       "2007-11-18", "2007-11-19", "2007-11-20", "2007-11-21"),
                       Distance = c("92336.22", "92336.23", "92336.22", "92336.20",
                       "92336.19", "92336.21", "92336.18))
ADataset
        Epoch Distance
1: 2007-11-15 92336.22
2: 2007-11-16 92336.23
3: 2007-11-17 92336.22
4: 2007-11-18 92336.20
5: 2007-11-19 92336.19
6: 2007-11-20 92336.21
7: 2007-11-21 92336.18

The analysis begins with establishing start and end dates:

############## Establish dates for analysis
#4.Set date for center of duration
StartDate <- "2007-11-18"
as.numeric(as.Date(StartDate)); StartDate
EndDate <- as.Date(tail(Adataset$Epoch,1)); EndDate

Then I establish time durations for analysis:

#5.Quantify duration of time window
STDuration <-  1
LTDuration  <- 3

Then I write functions to regress over both durations and return the slopes:

# Write STS and LTS functions, each with following steps
#6.Define time window- from StartDate less ShortTermDuration to 
StartDate plus ShortTermDuration
#7.Define Short Term & Long Term datasets
#8. Run regression over dataset
my_STS_Function <- function (StartDate) {

  STAhead  <- as.Date(StartDate) + STDuration; STAhead
  STBehind <- as.Date(StartDate) - STDuration; STBehind
  STDataset  <- subset(Adataset, as.Date(Epoch) >= STBehind & as.Date(Epoch)<STAhead)
  STResults <- rlm( Distance ~ Epoch, data=STDataset); STResults
  STSummary <- summary( STResults ); STSummary
  # Return coefficient (Slope of regression)
  STNum <- STResults$coefficients[2];STNum
}
my_LTS_Function <- function (StartDate) {
  LTAhead  <- as.Date(StartDate) + LTDuration; LTAhead
  LTBehind <- as.Date(StartDate) - LTDuration; LTBehind
  LTDataset  <- subset(Adataset, as.Date(Epoch) >= LTBehind & as.Date(Epoch)<LTAhead)
  LTResults <- rlm( Distance ~ Epoch, data=LTDataset); LTResults
  LTSummary <- summary( LTResults ); LTSummary
  # Return coefficient (Slope of regression)
  LTNum <- LTResults$coefficients[2];LTNum

Then I test the function to make sure it works for a single date:

myTestResult <- my_STS_Function("2007-11-18")

It works, so I move on to apply the function over the range of dates in the dataset:

mySTSResult <- apply(Adataset, 1, my_STS_Function, seq(StartDate : EndDate))

...in which my desired result is a list or array or vector of mySTSResult (slopes) (and, subsequently, a separate list/array/vector of myLTSResults so then I can create a STSlope:LTSlope ratio over the duration), something like (mySTSResults fabricated)...

> Adataset
    Epoch Distance mySTSResults
1: 2007-11-15 92336.22            3
2: 2007-11-16 92336.23            4
3: 2007-11-17 92336.22            5
4: 2007-11-18 92336.20            6
5: 2007-11-19 92336.19            7
6: 2007-11-20 92336.21            8
7: 2007-11-21 92336.18            9

Only I get this error:

Error in FUN(newX[, i], ...) : unused argument(s) (1:1185)

What is this telling me and how to do correct it? I've done some looking and cannot find the correction.

Hopefully I've explained this sufficiently. Please let me know if you need further details.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
remarkableearth
  • 801
  • 1
  • 9
  • 11
  • your code has a number of small errors and is not reproducible - please fix – eddi Aug 13 '13 at 22:49
  • 1
    `my_STS_Function` only has one argument, but you're giving it two: the slice of the array and `StartDate:EndDate` – hadley Aug 13 '13 at 22:57
  • Just to make sure that you understand @hadley point: each row of `Adataset` will get matched to the first argument of `my_STS_Function` and then there is an attempt to match `seq(StartDate : EndDate)` to the second argument ... except there isn't one. (Furthermore, it should be `seq(StartDate, EndDate)` or just `StartDate : EndDate`.) – IRTFM Aug 14 '13 at 04:43

2 Answers2

0

Ok, it seems the problem is in the additional arguments to my_STS_Function as stated in your apply function call (as you have defined it with only one parameter). The date range is being passed as an additional parameter to that function, and R is complaining that it is unused (a vector of 1185 elements it seems). Are you rather trying to pull a subset of the rows restricted by date range first, then wishing to apply the my_STS_Function? I'd have to think a bit on an exact solution to that.

Sorry - I did my working out in the comments there. A possible solution is this:

subSet <- Adataset[Adataset[,1] %in% seq(StartDate:EndDate),][order(na.exclude(match(Adataset[,1], seq(StartData,EndDate))),]

Adapted from the answer in this question:

R select rows in matrix from another vector (match, %in)

Community
  • 1
  • 1
sHtev
  • 306
  • 2
  • 5
  • You could define a subset of your data by excluding everything outside the data range, something like: – sHtev Aug 13 '13 at 23:04
  • `subSet <- Adataset[Adataset[,1] %in% seq(StartDate:EndDate),][order(na.exclude(match(Adataset[,1], seq(StartData,EndDate))),]` – sHtev Aug 13 '13 at 23:09
  • To address your question, "Are you rather trying to pull a subset of the rows restricted by date range first, then wishing to apply the my_STS_Function?", Yes, I wish to limit the data by the date range first, then apply my_STS_Function in order to obtain a list/vector/array of results. – remarkableearth Aug 14 '13 at 03:42
  • When I input your code above, I get the following errors: Error in StartDate:EndDate : NA/NaN argument In addition: Warning message: In seq(StartDate:EndDate) : NAs introduced by coercion > [order(na.exclude(match(Rdataset[,1], seq(StartDate,EndDate))),] Error: unexpected '[' in " [". I thought I had avoided the NaN through the code at the beginning: as.numeric(as.Date(StartDate)); StartDate EndDate <- as.Date(tail(Adataset$Epoch,1)); EndDate. – remarkableearth Aug 14 '13 at 03:44
  • ...And adding (or subtracting) brackets doesn't have any effect on the error, so I don't understand the error message. – remarkableearth Aug 14 '13 at 03:51
  • In my second comment above, if you see Rdataset, replace it with Adataset. Sorry for the poor editing. – remarkableearth Aug 14 '13 at 03:53
  • According to help(order), it appears that use of 'order' "returns a permutation which rearranges its first argument into ascending or descending order" but I don't want to sort the data nor do I want to exclude data from the computation of my_STS_function (or my_LTS_function). So how about if I just leave out "seq(StartDate : EndDate)" from "mySTSResult <- apply(Adataset, 1, my_STS_Function, seq(StartDate : EndDate))" so that I only have one argument? That returns a list of values as long as the dataset (that I need to format to a matrix). Can someone validate? – remarkableearth Aug 14 '13 at 04:56
  • your code returns an error of `Error: unexpected '[' in " ["`. I've added and subtracted various front and rear brackets and parentheses but can't eliminate the error to get past it and find out if your suggestion really works. – remarkableearth Aug 14 '13 at 19:14
  • your code `subSet...` returns an error with the bracketing. I've attempted to correct it but to no good. What I'm really interested in is seeing how your suggestion works. Could you fix? – remarkableearth Aug 15 '13 at 19:27
  • needs another bracket (parenthesis) after the last EndDate and before the ",]" - forgot to close the "seq" function call. – sHtev Aug 19 '13 at 15:20
  • Also, to generate sequences of dates, you probably need to use seq.Date rather than normal seq – sHtev Aug 19 '13 at 17:32
0

Adding this as a new answer as the previous one was getting confused. A previous commenter was correct, there are bugs in your code, but they aren't a sticking point.

My updated approach was to use seq.Date to generate the date sequence (only works if you have a data point for each day between the start and end - though you could use na.exclude as above):

dates = seq.Date(as.Date(StartDate),as.Date(EndDate),"days")

You then use this as the input to apply, with some munging of types to get things working correctly (I've done this with a lamda function):

mySTSResult <- apply(as.matrix(dates), 1, function(x) {class(x) <- "Date"; my_STS_Function(x)})

Then hopefully you should have a vector of the results, and you should be able to do something similar for LTS, and then manipulate that into another column in your original data frame/matrix.

sHtev
  • 306
  • 2
  • 5