Replacing the last value within groups with different values

Question

My question is similar to this post, but the difference is instead of replacing the last value within each group/id with all 0's, different values are used to replace the last value within each group/id.

Here is an example (I borrowed it from the above link):

          id  Time
1         1    3
2         1    10
3         1    1
4         1    0
5         1    9999
6         2    0
7         2    9
8         2    500
9         3    0
10        3    1

In the above link, the last value within each group/id was replaced by a zero, using something like:

df %>%
  group_by(id) %>%
  mutate(Time = c(Time[-n()], 0))

And the output was

          id  Time
1         1    3
2         1    10
3         1    1
4         1    0
5         1    0
6         2    0
7         2    9
8         2    0
9         3    0
10        3    0

In my case, I would like the last value within each group/id to be replaced by a different value. Originally, the last value within each group/id was 9999, 500, and 1. Now I would like: 9999 is replaced by 5, 500 is replaced by 12, and 1 is replaced by 92. The desired output is:

          id  Time
1         1    3
2         1    10
3         1    1
4         1    0
5         1    5
6         2    0
7         2    9
8         2    12
9         3    0
10        3    92

I tried this one:

df %>%
  group_by(id) %>%
  mutate(Time = replace(Time, n(), c(5,12,92))),

but it did not work.

Please provide sample data and code. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — scoa, Aug 04 '15 at 08:28
You can't insert several values into a single one. You should provide a reproducible example and your desired output as pointed out by @scoa — David Arenburg, Aug 04 '15 at 08:50
@goren9 Are you Jason? If not, why have you rewritten his question? — Frank, Aug 04 '15 at 18:20
When you say 'it did not work', what output did you get instead? — Nathaniel Ford, Aug 04 '15 at 19:53
The last value within each group was replaced by "5", the 1st value in c(5,12,92). And I got warning messages: 1: In mutate_impl(.data, dots) : number of items to replace is not a multiple of replacement length ... — Jason, Aug 04 '15 at 20:34

David Arenburg · Answer 1 · 2015-08-06T08:49:49.817

This could be solved using almost identical solution as I posted in the linked question. e.g., just replace 0L with the desired values

library(data.table)
indx <- setDT(df)[, .I[.N], by = id]$V1
df[indx, Time := c(5L, 12L, 92L)]
df
#     id Time
#  1:  1    3
#  2:  1   10
#  3:  1    1
#  4:  1    0
#  5:  1    5
#  6:  2    0
#  7:  2    9
#  8:  2   12
#  9:  3    0
# 10:  3   92

So to add some explanations:

.I is identical to row_number() or 1:n() in dplyr for an ungrouped data, e.g. 1:nrow(df) in base R
.N is like n() in dplyr, e.g., the size of a certain group (or the whole data set). So basically when I run .I[.N] by group, I'm retrieving the global index of the last row of each group
The next step is just use this index as a row index within df while assigning the desired values to Time by reference using the := operator.

Edit

Per OPs request, here's a possible dplyr solution. Your original solution doesn't work because you are working per group and thus you were trying to pass all three values to each group.

The only way I can think of is to first calculate group sizes, then ungroup and then mutate on the cumulative sum of these locations, something among these lines

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(indx = n()) %>%
  ungroup() %>%
  mutate(Time = replace(Time, cumsum(unique(indx)), c(5, 12, 92))) %>%
  select(-indx)

# Source: local data frame [10 x 2]
# 
#    id Time
# 1   1    3
# 2   1   10
# 3   1    1
# 4   1    0
# 5   1    5
# 6   2    0
# 7   2    9
# 8   2   12
# 9   3    0
# 10  3   92

Great. Thank you so much. I use "data.table" a lot as well, but just out of curiosity: how to do it using something like "mutate(Time = replace(Time, n(), c(5,12,92)))"? — Jason, Aug 06 '15 at 04:34

score 3 · Answer 2 · answered Aug 06 '15 at 09:05

Another way using data.table would be to create another data.table which contains the values to be replaced with for a given id, and then join and update by reference (simultaneously).

require(data.table) # v1.9.5+ (for 'on = ' feature)
replace = data.table(id = 1:3, val = c(5L, 12L, 9L)) # from @David
setDT(df)[replace, Time := val, on = "id", mult = "last"]

#     id Time
#  1:  1    3
#  2:  1   10
#  3:  1    1
#  4:  1    0
#  5:  1    5
#  6:  2    0
#  7:  2    9
#  8:  2   12
#  9:  3    0
# 10:  3    9

In data.table, joins are considered as an extension of subsets. It's natural to think of doing whatever operation we do on subsets also on joins. Both operations do something on some rows.

For each replace$id, we find the last matching row (mult = "last") in df$id, and update that row with the corresponding val.

Installation instructions for v1.9.5 here. Hope this helps.

Replacing the last value within groups with different values

2 Answers2

Linked