Generate sequence within sub group in data.table

Question

I would like to generate sequence within subgroup columns e.g. I have two columns id1,val and would like to sort data by id1, val but then generate counter for id1.

Input

input <- data.frame("id1"=c(1,1,1,1,2,2,2),val=c(2,3,4,1,4,3,5))

Expected Output

id1,val,grp 
1,1,1
1,2,2
1,3,3
1,4,4
2,3,1
2,4,2
2,5,3

Previous Reference Posts :

Count for sub group using .grp in data.table

Numbering rows within groups in a data frame

Used below code (I am trying to use code on big data and looking for a solution so I don't need to add an extra step to sort data for "val" column before generating sequence)

input[, new1:=seq_len(.N), by=c('id1')]

akrun · Accepted Answer · 2019-11-29T19:13:08.827

1

We group by 'id1', sort the 'val' and then create 'grp' as row_number()

input %>%
  group_by(id1) %>%
  mutate(val = sort(val), grp= row_number())

Or another option is to arrange

input %>%
   arrange(id1, val) %>%
   group_by(id1) %>%
   mutate(grp = row_number())

Or using data.table

library(data.table)
setDT(input)[, c("grp", "val") := .(seq_len(.N), sort(val)), by = id1]
input
#   id1 val grp
#1:   1   1   1
#2:   1   2   2
#3:   1   3   3
#4:   1   4   4
#5:   2   3   1
#6:   2   4   2
#7:   2   5   3

If we need to sort as well, use setorder based on the 'id1' and 'val' to order in place, then create the 'grp' as the rowid of 'id1'

input <- data.frame("id1"=c(1,1,1,1,2,2,2),val=c(2,3,4,1,4,3,5), 
        achar=c('a','a','b','b','d','c','e'))
setorder(setDT(input), id1, val)[, grp := rowid(id1)][]
#   id1 val achar grp
#1:   1   1     b   1
#2:   1   2     a   2
#3:   1   3     a   3
#4:   1   4     b   4
#5:   2   3     c   1
#6:   2   4     d   2
#7:   2   5     e   3

edited Nov 29 '19 at 19:13

answered Nov 29 '19 at 18:28

akrun

874,273
37
540
662

The above solution works perfectly but if you have extra columns in data.table then it will not be adjusted correctly. – R007 Nov 29 '19 at 18:59
@R007 `setDT(input)[, c("grp", "val") := .(seq_len(.N), sort(val)), by = id1]` thiss would work eeven if there are other columns or the dplyr solutions. I don't undersstand the issue – akrun Nov 29 '19 at 19:00
I applied logic on new data.table and value for achar was not correct. e.g. input <- data.frame("id1"=c(1,1,1,1,2,2,2),val=c(2,3,4,1,4,3,5), achar=c('a','a','b','b','d','c',e')) – R007 Nov 29 '19 at 19:01
it should have sorted data at row level when data was sorted for "val". achar should be in (b,a,a,b,c,d,e) – R007 Nov 29 '19 at 19:07
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/203347/discussion-between-r007-and-akrun). – R007 Nov 29 '19 at 19:09

score 0 · Answer 2 · answered Nov 29 '19 at 18:47

Here's a little factor hack.

# Load library
library(data.table)

# Create data table
input <- data.table(id1=c(1,1,1,1,2,2,2),val=c(2,3,4,1,4,3,5))

input[, foo := as.integer(factor(val)), by = "id1"]

# Print result
input
#>    id1 val foo
#> 1:   1   2   2
#> 2:   1   3   3
#> 3:   1   4   4
#> 4:   1   1   1
#> 5:   2   4   2
#> 6:   2   3   1
#> 7:   2   5   3

# Reorder for comparison with question
input[order(id1, val)]
#>    id1 val foo
#> 1:   1   1   1
#> 2:   1   2   2
#> 3:   1   3   3
#> 4:   1   4   4
#> 5:   2   3   1
#> 6:   2   4   2
#> 7:   2   5   3

^{Created on 2019-11-29 by the reprex package (v0.3.0)}

Generate sequence within sub group in data.table

2 Answers2