2

Hi all I have a part of a data set:

# A tibble: 10 × 2
      id value
   <dbl> <dbl>
1      1     2
2      1     2
3      1     2
4      5     2
5      6     3
6      7     0
7      8     4
8      8     4
9      9     1
10     9     1

I would like to add "1" to every subsequent value of the same ID. E.g. the first value of "id 1" is 2 while the second value of "id 1" is 3 and third value of "id 1" is 4. However, those with only 1 id (5,6,7) are left as it is. So essentially it would look like this for the first few values:

# A tibble: 10 × 2
      id value
   <dbl> <dbl>
1      1     2
2      1     3
3      1     4
4      5     2
5      6     3
6      7     0
7      8     4
8      8     5
9      9     1
10     9     2

Thanks in advance!

Joey

DATA

structure(list(id = c(1, 1, 1, 5, 6, 7, 8, 8, 9, 9), value = c(2, 
2, 2, 2, 3, 0, 4, 4, 1, 1)), .Names = c("id", "value"), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

Expected output:

structure(list(id = c(1, 1, 1, 5, 6, 7, 8, 8, 9, 9), value = c(2, 
3, 4, 2, 3, 0, 4, 5, 1, 2)), .Names = c("id", "value"), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))
SabDeM
  • 7,050
  • 2
  • 25
  • 38
Master_Yoda
  • 101
  • 1
  • 8
  • Ciao! Welcome to SO. First of all you should read [here](http://stackoverflow.com/help/how-to-ask) about how to ask a good question; a good question has better likelihood to be solved and you to receive help. On the other hand a read of [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) is also good thing. It explains how to create a reproducible example in R. Help users to help you by providing a piece of your data, a desired output and things you have already tried. – SabDeM Apr 30 '17 at 21:52
  • I will definitely look into that, thanks. – Master_Yoda Apr 30 '17 at 22:23

4 Answers4

2

A simple data.table solution would be:

library(data.table)
dt<-as.data.table(df)
dt[, value2 := value + ((1:.N) - 1), by = id]

dt
#    id value value2
# 1:  1     2      2
# 2:  1     2      3
# 3:  1     2      4
# 4:  5     2      2
# 5:  6     3      3
# 6:  7     0      0
# 7:  8     4      4
# 8:  8     4      5
# 9:  9     1      1
#10:  9     1      2

Another solution would be to use base R and rle:

df$value2 <- df$value + unlist(sapply(rle(df$id)$lengths, function(x) (1:x) - 1))
Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • Hi Mike your first solution worked perfectly. Would you mind explaining the syntax in your solution? From my understanding is that: form the column value2 which is the sum of the value column with the number of instances the same id appeared, and the minus 1 is so that the first instance of id remain untouched. – Master_Yoda Apr 30 '17 at 22:22
  • Sure! I think you have it correct. It's for each `by` group, add `1:length(group)` and subtract 1 so that the original value stays the same. Thus if a `by` group has 3 observations it will add `c(1,2,3) - 1` to that group. – Mike H. Apr 30 '17 at 22:28
  • Hi Mike, I just have one last question. I was under the impression that ":=", "==", and "<-" are all assigning operators, however my dataset looks drastically different when I replace ":=" with "==" or "<-". Thanks a bunch! – Master_Yoda Apr 30 '17 at 22:32
  • `:=` is syntax specific to `data.table` and basically creates/updates a column. `==` is a comparison operator and `<-` is an assignment operator. Hope that makes sense. You can learn more about them by doing `?\`:=\`` or `?\`==\`` – Mike H. Apr 30 '17 at 22:42
  • Oops I meant to type "=" instead of "==". Anyway your help is greatly appreciated, thanks again! – Master_Yoda Apr 30 '17 at 22:59
1

here it is a solution with dplyr. Consider that is is not robust if numbers are not progressive (in other words increasing), but I get that they are so. If not we have to find another solution.

df %>% group_by(id) %>%
    transmute(value = seq(from = min(value), by = 1, length.out = length(value)) )
Adding missing grouping variables: `id`
Source: local data frame [10 x 2]
Groups: id [6]

      id value
   <dbl> <dbl>
1      1     2
2      1     3
3      1     4
4      5     2
5      6     3
6      7     0
7      8     4
8      8     5
9      9     1
10     9     2
SabDeM
  • 7,050
  • 2
  • 25
  • 38
  • Hi, your solution works with the small sample data I posted however it removes all other variable names so I just switched out the transmute and put in mutate function instead. Thanks for the help! – Master_Yoda May 01 '17 at 00:20
1

Using dplyr you could do the following...

library(dplyr)
df2 <- df %>% group_by(id) %>% mutate(value=value+seq_along(id)-1)
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
0

Or we can use base R

df1$value <- with(df1, ave(value, id, FUN =seq_along)+value -1)
df1$value
#[1] 2 3 4 2 3 0 4 5 1 2
akrun
  • 874,273
  • 37
  • 540
  • 662