1

I have a question regarding efficiently populating an R dataframe based on data retrieved from another dataframe.

So my input typically looks like:

dfInput <- data.frame(start = c(1,6,17,29), end = c(5,16,28,42), value = c(1,2,3,4))

start  end  value
1      5    1
6      16   2
17     28   3
29     42   4

I want to find the min and max values in cols 1 and 2 and create a new dataframe with a row for each value in that range:

rangeMin <- min(dfInput$start)
rangeMax <- max(dfInput$end)

dfOutput <- data.frame(index = c(rangeMin:rangeMax), value = 0)

And then populate it with the appropriate "values" from the input dataframe:

for (i in seq(nrow(dfOutput))) {
  lookup <- dfOutput[i,"index"]
  dfOutput[i, "value"] <- dfInput[which(dfInput$start <= lookup &
                                  dfInput$end >= lookup),"value"]
}

This for-loop achieves what I want to do, but it feels like this is a very convoluted way to do it.

Is there a way that I can do something like:

dfOutput$value <- dfInput[which(dfInput$start <= dfOutput$index &
                          dfInput$end >= dfOutput$index),"value"]

Or something else to populate the values when instantiating dfOutput.

I feel like this is pretty basic but I'm new to R, so many thanks for any help!

Rémi Coulaud
  • 1,684
  • 1
  • 8
  • 19
awenborn
  • 417
  • 1
  • 4
  • 13

1 Answers1

1

You can create a sequence between start and end :

library(dplyr)

dfInput %>%
  mutate(index = purrr::map2(start, end, seq)) %>%
  tidyr::unnest(index) %>%
  select(-start, -end)

# A tibble: 42 x 2
#   value index
#   <dbl> <int>
# 1     1     1
# 2     1     2
# 3     1     3
# 4     1     4
# 5     1     5
# 6     2     6
# 7     2     7
# 8     2     8
# 9     2     9
#10     2    10
# … with 32 more rows

In base R :

do.call(rbind, Map(function(x, y, z) 
   data.frame(index = x:y, value = z), dfInput$start, dfInput$end, dfInput$value))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213