1

NOTE: I technically know how to do this, but I feel like there has to be a "nicer" way to do this. If such questions are not allowed here just delete it, but I would really like to improve my R style, so any suggestions are welcome.

I have a dataframe data <- data.frame(foo=rep(c(-1,2),5))

   foo
1   -1
2    2
3   -1
4    2
5   -1
6    2
7   -1
8    2
9   -1
10   2

Now I would like to be able to set the entries of foo to a certain value (for this example, let's say 1) if the current entry is smaller than that value. So my desired output would be

   foo
1    1
2    2
3    1
4    2
5    1
6    2
7    1
8    2
9    1
10   2

I feel like there should be something like data$foo <- max(data$foo,1) that does the job (but ofc, it "maxes" over the whole column).

Is there an elegant way to do this?

data$foo <- ifelse(data$foo < 1,1,data$foo) and data$foo <- lapply(data$foo,function(x) max(1,x)) just feel somewhat "ugly".

Tschösi
  • 491
  • 3
  • 13

2 Answers2

3

max gives you maximum of the whole column but for your case you need pmax(parallel maximum) so it gives you maximum of 1 or each number in the vector.

data$foo <- pmax(data$foo, 1)
data

#   foo
#1    1
#2    2
#3    1
#4    2
#5    1
#6    2
#7    1
#8    2
#9    1
#10   2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

This works:

data <- data.frame(foo=rep(c(-1,2),5))

val <- 1

data[data$foo < val, ] <- val

Let's break this down. data$foo takes the column and makes it into a vector. data$foo < val checks which elements of this vector are smaller than val, creating a new vector of similar lenghts filled with TRUE and FALSE at the correct positions.

Finally, the entire line data[data$foo < val, ] <- val uses that vector of TRUE and FALSE to select the rows (using the [, ]) of data to which val is now used.

Gimelist
  • 791
  • 1
  • 10
  • 25
  • Although I personally think the pmax solution is a little bit more elegant in this special case this solution is more flexible and also seems to be a little faster (approx. 20% in my tests). For an even faster solution use the data.table version data[foo < 1, foo := 1]). – bratwoorst711 Feb 27 '21 at 13:21