3

I need to get row numbers for explicit rows grouped over id. Let's say dataframe (df) looks like this:

 id  a   b 
 3   2   NA
 3   3   2
 3   10  NA
 3   21  0
 3   2   NA
 4   1   5
 4   1   0
 4   5   NA

I need to create one more column that would give row number sequence excluding the case where b == 0.

desired output:

 id  a   b   row
 3   2   NA   1
 3   3   2    2
 3   10  NA   3
 3   21  0    -
 3   2   NA   4
 4   1   5    1
 4   1   0    -
 4   5   NA   2

I used dplyr but not able to achieve the same, My code:

df <- df %>%
       group_by(id) %>%
       mutate(row = row_number(id[b != 0]))

Please suggest some better way to do this.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Dheeraj Singh
  • 715
  • 1
  • 12
  • 24

1 Answers1

7

I would propose using the data.table package for its nice capability in operating on subsets and thus avoiding inefficient operations such as ifelse or evaluation the whole data set. Also, it is better to keep you vector in numeric class (for future operations), thus NA will be probably preferable to - (character), here's a possible solution

library(data.table)
setDT(df)[is.na(b) | b != 0, row := seq_len(.N), by = id]
#    id  a  b row
# 1:  3  2 NA   1
# 2:  3  3  2   2
# 3:  3 10 NA   3
# 4:  3 21  0  NA
# 5:  3  2 NA   4
# 6:  4  1  5   1
# 7:  4  1  0  NA
# 8:  4  5 NA   2

The idea here is to operate only on the rows where is.na(b) | b != 0 and generate a sequence of each group size (.N) while updating row in place (using :=). All the rest of the rows will be assigned with NAs by default.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Argh, back to the old `==` vs `%in%` vs join distinctions http://stackoverflow.com/a/16222108 (related to Arun's suggestion in the comment above). – Frank Aug 19 '15 at 17:01