Vectorizing a column-by-column comparison to separate values

Question

I'm working with data gathered from multi-channel electrode systems, and am trying to make this run faster than it currently is, but I can't find any good way of doing it without loops.

The gist of it is; I have modified averages for each column (which is a channel), and need to compare each value in a column to the average for that column. If the value is above the adjusted mean, then I need to put that value in another data frame so it can be easily read.

Here is some sample code for the problematic bit:

readout <- data.frame(dimnmames <- c("Values"))
#need to clear the dataframe in order to run it multiple times without errors
#timeFrame is just a subsection of the original data, 60 channels with upwards of a few million rows
readout <- readout[0,]
for (i in 1:ncol(timeFrame)){
  for (g in 1:nrow(timeFrame)){
    if (timeFrame[g,i] >= posCompValues[i,1]) 
      append(spikes, timeFrame[g,i])
  }
}

The data ranges from 500 thousand to upwards of 130 million readings, so if anyone could point me in the right direction I'd appreciate it.

Please make a [small reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Either share code to simulate small sample data or use `dput()` to share a copy/pasteable illustrative subset of data. — Gregor Thomas, Aug 07 '17 at 18:03
I should have waited for you to follow @Gregor's advice. However, I took a shot at it anyway. — Ben Bolker, Aug 07 '17 at 18:08

Ben Bolker · Accepted Answer · 2017-08-07T18:43:28.273

1

Something like this should work:

Return values of x greater than y:

cmpfun <- function(x,y) return(x[x>y])

For each element (column) of timeFrame, compare with the corresponding value of the first column of posCompValues

vals1 <- Map(cmpfun,timeFrame,posCompValues[,1])

Collapse the list into a single vector:

spikes <- unlist(vals1)

If you want to save both the value and the corresponding column it may be worth unpacking this a bit into a for loop:

resList <- list()
for (i in seq(ncol(timeFrame))) {
   tt <- timeFrame[,i]
   spikes <- tt[tt>posCompVals[i,1]]
   if (length(spikes)>0) {
      resList[[i]] <- data.frame(value=spikes,orig_col=i)
   }
}
res <- do.call(rbind, resList)

edited Aug 07 '17 at 18:43

answered Aug 07 '17 at 18:07

Ben Bolker

211,554
25
370
453

This works! Now I just need to figure out how to keep the column the data was originally in. Thank you for helping, sorry about not having a reproducible dataset, I wasn't sure how to go about that with this. – Logan Whitehouse Aug 07 '17 at 18:18
Ben, sorry to ask this but I'm not familiar with the resList and unlist functions, so I'm not really sure what's going on here. It's throwing this error: Error in data.frame(value = tt[tt > posCompValues[i, 1]], orig_col = i) : arguments imply differing number of rows: 0, 1 – Logan Whitehouse Aug 07 '17 at 18:36
`resList` isn't a function (it's a variable I created). A good way to create a reproducible example is to take just the first few rows and columns of data (and if necessary, add some outliers/spikes so there's something to look at) and use `dput()` to dump them; edit your question to include the results. – Ben Bolker Aug 07 '17 at 18:38
what are `dim(timeFrame)` and `dim(posCompVals)`? Sounds like you might have more columns in `timeFrame` than rows in `posCompVals` ? – Ben Bolker Aug 07 '17 at 18:39
> dim(timeFrame) [1] 55700 60 > dim(posCompValues) [1] 60 1 And I don't get any results, that would be most of my problem. With my initial loop structure nothing was getting populated, it just would run continuously. – Logan Whitehouse Aug 07 '17 at 18:41
see edits (problem is when you have a column with no spikes) – Ben Bolker Aug 07 '17 at 18:43
Ah, thank you. I will read up on this structure and hopefully be able to grasp what's going on here in the second part. I appreciate it! – Logan Whitehouse Aug 07 '17 at 18:50

Vectorizing a column-by-column comparison to separate values

1 Answers1