0

I have two datasets, one where the key variable is unique on each row and another where the key variable repeats on a number of rows; like in this example data:

set.seed(300)    
data1 <- data.frame(key = LETTERS[seq( from = 1, to = 5 )], value = rnorm(5,1,1))
data2 <- data.frame(key = rep(LETTERS[seq( from = 1, to = 5 )], 2), value = rnorm(10,1,1))

I want a function to recode the value variable on data1 according to the minimum value of the valuevariable on data2.

To do so, I've been trying to repeat it on each key value, like this:

data1$value[data1$key == "A"] <- min(range(data2$value[data2$key == "A"]))
data1$value[data1$key == "B"] <- min(range(data2$value[data2$key == "B"]))
data1$value[data1$key == "C"] <- min(range(data2$value[data2$key == "C"]))
data1$value[data1$key == "D"] <- min(range(data2$value[data2$key == "D"]))
data1$value[data1$key == "E"] <- min(range(data2$value[data2$key == "E"]))

I know there must be a bunch of different ways to accomplish this, but I haven't found the way (maybe using dplyr?) and I would like to avoid a for loop.

Thanks in advance!

EDIT:

I don't want to dplyr::summarise a new dataframe; I need to keep the other variables present in data1 (yes, originally there are more variables than I put on here).

David Jorquera
  • 2,046
  • 12
  • 35
  • 1
    So, the current value of `data1` doesn't matter, you just want the min by group for `data2` (and replace `data1` with that). Look at the [How to sum by group R-FAQ](https://stackoverflow.com/q/1660124/903061) and replace `sum` with `min` in your favorite answer. With `dplyr`: `data1 <- data2 %>% group_by(key) %>% summarize(value = min(value))`. Or in base R `data1 <- aggregate(value ~ key, data2, min)` – Gregor Thomas May 28 '19 at 16:59
  • 1
    If there are other columns of `data1` you need to keep, treat this as an intermediate step and follow it up with a join (`left_join` in `dplyr`. `merge()` in base). – Gregor Thomas May 28 '19 at 17:01

0 Answers0