row numbers for explicit rows in r

Question

I need to get row numbers for explicit rows grouped over id. Let's say dataframe (df) looks like this:

I need to create one more column that would give row number sequence excluding the case where b == 0.

desired output:

 id  a   b   row
 3   2   NA   1
 3   3   2    2
 3   10  NA   3
 3   21  0    -
 3   2   NA   4
 4   1   5    1
 4   1   0    -
 4   5   NA   2

I used dplyr but not able to achieve the same, My code:

df <- df %>%
       group_by(id) %>%
       mutate(row = row_number(id[b != 0]))

Please suggest some better way to do this.

Or `mutate(row = replace(cumsum(is.na(b)), b == 0, "-"))` but you should note that by placing a "-" into a numeric column, it will be coerced to character. — talat, Aug 19 '15 at 09:13

score 7 · Accepted Answer · answered Aug 19 '15 at 11:54

I would propose using the data.table package for its nice capability in operating on subsets and thus avoiding inefficient operations such as ifelse or evaluation the whole data set. Also, it is better to keep you vector in numeric class (for future operations), thus NA will be probably preferable to - (character), here's a possible solution

library(data.table)
setDT(df)[is.na(b) | b != 0, row := seq_len(.N), by = id]
#    id  a  b row
# 1:  3  2 NA   1
# 2:  3  3  2   2
# 3:  3 10 NA   3
# 4:  3 21  0  NA
# 5:  3  2 NA   4
# 6:  4  1  5   1
# 7:  4  1  0  NA
# 8:  4  5 NA   2

The idea here is to operate only on the rows where is.na(b) | b != 0 and generate a sequence of each group size (.N) while updating row in place (using :=). All the rest of the rows will be assigned with NAs by default.

Argh, back to the old `==` vs `%in%` vs join distinctions http://stackoverflow.com/a/16222108 (related to Arun's suggestion in the comment above). — Frank, Aug 19 '15 at 17:01

row numbers for explicit rows in r

1 Answers1