Use dplyr complete() to create new variable based on min/max values

Question

EDIT: although this question has been closed, it is helpful to note that the answers provided use a very different approach (with dplyr) than the original question asked in 2012(!). These new answers may be helpful for different users.

I have a datasets of sites with the min and max years when they were operational. But I want to expand this dataset so that each year the site was operational has a row.

For example:

set.seed(42)
df <- data.frame(
  site = rep(LETTERS[1:10]),
  minY = sample(1980:1990, 10),
  maxY = sample(2000:2010, 10)
)
df
   site minY maxY
1     A 1980 2007
2     B 1984 2006
3     C 1990 2003
4     D 1988 2000
5     E 1981 2004
6     F 1983 2005
7     G 1986 2008
8     H 1989 2001
9     I 1987 2009
10    J 1985 2010

So in my final dataset Site A would have a 28 rows (one for each year it was operating).

I've been trying to use the complete function, but I keep getting an error message:

complete(df,
         nesting(site),
         fill = list(value1 = minY, value2 = maxY))
Error in vec_is_list(replace) : object 'minY' not found

Please also add the desired output. – Ed_Gravy Nov 21 '22 at 20:43 — Ed_Gravy, Nov 21 '22 at 20:43

score 4 · Accepted Answer · answered Nov 21 '22 at 20:47

Maybe this works for you using dplyrs summarize.

library(dplyr)

df %>% 
  rowwise() %>% 
  summarize(site, year = seq(minY, maxY, 1))
# A tibble: 210 × 2
   site   year
   <chr> <dbl>
 1 A      1980
 2 A      1981
 3 A      1982
 4 A      1983
 5 A      1984
 6 A      1985
 7 A      1986
 8 A      1987
 9 A      1988
10 A      1989
# … with 200 more rows

score 2 · Answer 2 · answered Nov 21 '22 at 20:47

2

You can use tidyr::uncount() to create duplicates by a weight. In your case, just adding rows according to the difference in years can be done like this

df |>
  uncount(weights = maxY - minY + 1)

If you wish to add a column of unique years, you could add it with dplyr::mutate()

df |>
  uncount(weights = maxY - minY + 1) |>
  group_by(site) |>
  mutate(unique_year = seq.default(min(minY),max(maxY)))

This will result in a data.frame with a number of rows according to the unique years between maxY and minY as well as a column with the unique years.

answered Nov 21 '22 at 20:47

FactOREO

86
4

1

I just posted literally the same answer in the linked main thread on this problem. It seems to me that your post came first, so let me know if you would like to post it there and I will remove my post :) – tmfmnk Nov 21 '22 at 21:00
@FactOREO I wasn't able to get this solution to work. I assumed your |> was a pipe (I use %>%), so I modified your code that way and got the following error: Error in View : no applicable method for 'mutate' applied to an object of class "NULL" – tnt Nov 21 '22 at 21:07
1

@tnt |> is the base R pipe, usable with R 4.1 and above if Iam not mistaken. It basically is a more convenient way of writing f(g(x)) as x |> g() |> f(), but lacks on some functionality the `magittr` pipe `%>%` has (like permanent placeholder `.`, the base pipe uses `_` only for named arguments and I believe only in the right next function, but not inside other functons called from inside the next function). – FactOREO Nov 22 '22 at 13:16
@tmfmnk It's fine, I am only here to answer the one or another question to help other people seeking for advice. :) – FactOREO Nov 22 '22 at 13:17
Thanks @FactOREO. I had not seen it before! – tnt Nov 23 '22 at 16:29

Use dplyr complete() to create new variable based on min/max values

2 Answers2