Set values less than threshold to zero, with column-specific thresholds

Question

I have two data frames. One of them contains 165 columns (species names) and almost 193.000 rows which in each cell is a number from 0 to 1 which is the percent possibility of the species to be present in that cell.

 POINTID Abie_Xbor Acer_Camp Acer_Hyrc Acer_Obtu Acer_Pseu Achi_Gran
  2      0.0279037  0.604687 0.0388309 0.0161980 0.0143966  0.240152
  3      0.0294101  0.674846 0.0673055 0.0481405 0.0397423  0.231308
  4      0.0292839  0.603869 0.0597947 0.0526606 0.0463431  0.188875
  6      0.0331264  0.541165 0.0470451 0.0270871 0.0373348  0.256662
  8      0.0393825  0.672371 0.0715808 0.0559353 0.0565391  0.230833
  9      0.0376557  0.663732 0.0747417 0.0445794 0.0602539  0.229265

The second data frame contains 164 columns (species names, as the first data frame) and one row which is the threshold that above this we assume that the species is present and under of this the species is absent

Abie_Xbor Acer_Camp Acer_Hyrc Acer_Obtu Acer_Pseu Achi_Gran Acta_Spic 
 0.3155    0.2816    0.2579    0.2074    0.3007    0.3513    0.3514

What i want to do is to make a new data frame that will contain for every species in the presence possibility (my.data) the number of possibility if it is above the threshold (thres) and if it is under the threshold the zero number.

I know that it would be a for loop and if statement but i am new in R and i don't know for to do this. Please help me.

Note that this question is [cross-posted on CV](http://stats.stackexchange.com/questions/78988/how-to-do-a-for-loop-and-if-statement-between-data-frames). — gung - Reinstate Monica, Dec 08 '13 at 20:49
Please post a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), ie a simplified dataset w/ an example of what you want the output to look like. Please also read the [help pages](http://stackoverflow.com/help/asking) that provide guidance on how to ask questions on SO. You may want to read the [tour page](http://stackoverflow.com/tour) as well, which has information on SO for new users. — gung - Reinstate Monica, Dec 08 '13 at 20:52

score 1 · Accepted Answer · answered Dec 08 '13 at 21:19

I think you want something like this:

(Make up small reproducible example)

 set.seed(101)
 speciesdat <- data.frame(pointID=1:10,matrix(runif(100),ncol=10,
                         dimnames=list(NULL,LETTERS[1:10])))
 threshdat <- rbind(seq(0.1,1,by=0.1))

Now process:

 thresh <- unlist(threshdat) ## make data frame into a vector
 ## 'sweep' runs the function column-by-column if MARGIN=2
 ss2 <- sweep(as.matrix(speciesdat[,-1]),MARGIN=2,STATS=thresh,
             FUN=function(x,y) ifelse(x<y,0,x))
 ## recombine results with the first column
 speciesdat2 <- data.frame(pointID=speciesdat$pointID,ss2)

if this solves your problem you are encouraged to click the check-mark to accept the answer ... — Ben Bolker, Dec 08 '13 at 21:42

score 1 · Answer 2 · answered Dec 08 '13 at 21:44

It's simpler to have the same number of columns (with the same meanings of course).

frame2 = data.frame(POINTID=0, frame2)

R works with vectors so a row of frame1 can be directly compared to frame2

frame1[,1] < frame2

Could use an explicit loop for every row of frame1 but it's common to use the implicit loop of "apply"

answer = apply(frame1, 1, function(x) x < frame2)

This was all rather sloppy solution (especially changing frame2) but it hopefully demonstrates some basic R. Also, I'd generally prefer arrays and matrices when possible (they can still use labels but are generally faster).

score 0 · Answer 3 · answered Dec 09 '13 at 22:31

This produces a logical matrix which can be used to generate assignments with "[<-"; (Assuming name of multi-row dataframe is "cols" and named vector is "vec":

sweep(cols[-1], 2, vec, ">") # identifies the items to keep

cols[-1][ sweep(cols[-1], 2, vec, "<") ] <- 0

Your example produced a warning about the mismatch of the number of columns with the length of the vector, but presumably you can adjust the length of the vector to be the correct number of entries.

Set values less than threshold to zero, with column-specific thresholds

3 Answers3