0

I have two data files consisting of 8 Rows, 2151 columns. I want to do a regression between each file, for each column, and pull out slope, intercept, and r-squared values. Example: do a regression of File 1 Column 1 (all 8 rows) and File 2 Column 1 (all 8 rows), grab the three values of interest (intercept, slope, rsquared), and move on to the next set of columns for both files.

@thelatemail gave me a tremendous piece of code that does nearly everything.

mapply(function(x,y) coef(lm(y~x)), input1, input2

I was hoping to tweak this a bit just so I can extract R2 values from the linear model. So I wrote a quick function just to see if I could replicate the success and go forward.

linear_calibration <- function(x,y) {
   co_values <- coef(lm(y~x))
   return(co_values)
}

test_output = mapply(linear_calibration(input1, input2))
write.table(test_output,file="dump.csv",sep=",")

Unfortunately when I write it this way, I get an error that states:

Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) : 
invalid type (list) for variable 'y'

I'm not really sure why I get an error when I write it out this way. I'm misunderstanding something. To me the long form of what I wrote seems identical to the original one line. But it isn't and so I'm trying to figure out how I can modify the code to make it work.

  • Have you tired `rbind` for merging columns together? Also, it would really help to post a reproducible example with codes-see this [link] (http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more details. – John_dydx May 18 '14 at 23:24
  • 1
    `merge` I think is the right step, but you will need to provide an example so we can help you along. – Hugh May 18 '14 at 23:37
  • If you're planning on running 2151 regression analyses and picking out significant results I would seriously reconsider what you are doing. – thelatemail May 19 '14 at 00:50
  • @thelatemail Not doing that. Unfortunately the nature of my task requires doing a regression for every single pair of columns. I basically have to pull the slope, intercept, and R2 from every regression. Just trying to figure out how to tackle the beast is the difficult part! – user1819274 May 19 '14 at 01:00

1 Answers1

0

For your first idea, for merge to work the way you want, you need to use the by argument in merge. Create an ID column in each dataframe, let's say you call it ID.

input_1$ID <- 1:8
input_2$ID <- 1:8

Then combined <- merge(input_1, input2, by="ID", all.x=TRUE, all.y=TRUE)

With regards to your second thought, this is how you would create a subset of the same column from each dataframe and run a regression on it.

df <- cbind(input_1[1], input_2[1])
model <- lm(df[,1] ~ df[,2])

Hope that helps

Sean Murphy
  • 1,217
  • 8
  • 15
  • Super helpful! I think I'm on the right track now. All I have to do is slap some loops around this thing and hopefully it can chug along and pump out the numbers I need. – user1819274 May 19 '14 at 01:02
  • 1
    @user1819274 - `mapply(function(x,y) coef(lm(y ~ x)) ,input_1,input_2)` no need for loops and such. – thelatemail May 19 '14 at 01:17
  • @thelatemail Holy cow thanks so much. That is some serious R-fu that I'm lacking in. – user1819274 May 19 '14 at 01:35
  • @user1819274 - whenever you have a problem needing to compare the first, second, third etc components of 2 or more lists/vectors/data.frames etc, then `mapply/Map` can often come in handy. – thelatemail May 19 '14 at 01:46
  • @thelatemail Yeah I'm reading up on apply functions now. Really awesome stuff. I'm still blown away that one line of code does everything I need, and here I was futzing around with this clunky POS that I was making. But the responses from everyone was very appreciated. – user1819274 May 19 '14 at 01:49