0

I am new to R and I have tried searching for an answer. I read about quantile and the "partial" argument of sort, but my apologies if I am missing something obvious. I'm wondering if there is a way of doing the following:

  1. Take an unsorted data set and sort it on x
  2. Throw away the top N data points
  3. Throw away the bottom N data points
  4. Perform a regression

Like for example, if i have 400 data points, I might want to throw away the top 5 and bottom 5 data points (not throw away values that are top 5% etc like I believe quantile would be).

Here's the code I have so far for performing the regression (some of the "if"-statements get a little complicated, so I left most of them out to try to simplify things):

Everything is in the dataframe "dependencies."

myY <- dependencies$yValue
myX0 <- dependencies$xValue
if ( timeInterval == 0 ) {
  cat("A","\n")
  myY <- dependencies$yValueAlternate
} else if ( timeInterval == 1 ) { 
  myX1 <- dependencies$xValueAlternate
}

##Add truncation step

if ( timeInterval == 0 ){
  myLm <- lm(myY~myX0,dependencies)
} else if ( timeInterval == 1){
  myLm <- lm(myY~myX0+myX1,dependencies)
}
print(myLm)
intercept <- coef(myLm)["(Intercept)"]
beta1 <- coef(myLm)["myX0"]

Thanks for reading and any advice/direction you can give.

imomushi8
  • 43
  • 3
  • 2
    Is your data in a data frame? Or just in vectors? And what variable do you want to throw out the top and bottom of? If everything is in one data frame then truncation would be as simple as `trunc_data = head(tail(your_data[order(your_data$variable_to_order_by), ], -5), -5)`. Then you can just use that data frame in the `data` argument of `lm`. But I can't tell what's what in your example. – Gregor Thomas Nov 14 '17 at 18:55
  • Maybe read on [how to make a reproducible example in R](https://stackoverflow.com/q/5963269/903061) and modify your question accordingly. – Gregor Thomas Nov 14 '17 at 18:56
  • 1
    If your dataset is named `data`, (1) `data[order(data$x), ]`; (2) `head(data, N)`; (3) `tail(data, N)`. As for (4) it seems you already know how to perform a regression. BTW, I do not understand the relation between your problem description and the code you posted. – Rui Barradas Nov 14 '17 at 18:56
  • @RuiBarradas to throw out points `-N`, not `N` (and reverse the head/tail order) – Gregor Thomas Nov 14 '17 at 18:58
  • @Gregor Sorry about that, I edited to say it is all in one data frame. You may have already given me the solution. Will try your head/tail line and will read up on your link (and thank you so much) – imomushi8 Nov 14 '17 at 19:02
  • Okay - but also *don't pull out vectors like `myY` and `myX0` before truncation - truncating the data frame won't change a vector you've already extracted. You are best off if the variables you use in the model formula are the ones in your data, not vectors you've pulled out. If you are trying to switch out different responses or predictors, built the formula with `paste()` instead. – Gregor Thomas Nov 14 '17 at 19:23

1 Answers1

0

If you are deliberately pulling data from the data frame this may be helpful. The parentheses within the brackets are important.

x <- sample(1:10000, 500, replace = TRUE)
n <- 5
z <- x[(1+n) : (length(x) - n)]
length(z)
[1] 490
Ron Sokoloff
  • 130
  • 1
  • 11