1

I am quite new to R and have run into a problem I apparently can't solve by myself. It should be fairly easy thou.

I aim to write a generic function that manipulates column n in dataframe df. I want it to peform a simple task, for each row, when n < 5 it should replace that value with a random number between 1 and 4.

df <- data.frame(n= 1:10, y = letters[1:10],
                 stringsAsFactors = FALSE)

What is the most elegant solution?

markus
  • 25,843
  • 5
  • 39
  • 58
Henri
  • 1,077
  • 10
  • 24
  • Did you read about `?replace` ? Here are some [“replace” function examples](https://stackoverflow.com/questions/11811027/replace-function-examples) – markus May 15 '19 at 20:42
  • I didn't. But I will now. Thanks. – Henri May 15 '19 at 20:43

1 Answers1

3

One way to do is create a logical index based on the column, subset the column based on the index and assign the sampled values

f1 <- function(dat, col) {
      i1 <- dat[[col]] < 5
      dat[[col]][i1] <- sample(1:4, sum(i1), replace = TRUE)
      dat
  }

f1(df, "n")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks, very cool. I can follow everything apart from sum(i1). What does the sum function do in this context? – Henri May 15 '19 at 20:58
  • 2
    @Henrik `sample()` second parameter is `size`, which is the size of a random vector you want to generate. `i1` is the logical vector consisting of `TRUE FALSE TRUE TRUE ...` values. Calling the `sum` on logical vector is equivalent of counting *number of `TRUE` values* in it (`FALSE` is `0`, `TRUE` is `1`). So summing up `i1` you find the number of values in `dat[[col]]` which are `< 5`, so you can tell `sample()` how many random values from `[1, 4]` you want it to generate for you. – utubun May 15 '19 at 21:16
  • @Henrik As utubun suggested, the `sum` is just for counting the number of elements that are less than 5. – akrun May 16 '19 at 01:42