finding first few rows in groups within data.table

Question

With a big data.table, we need to take a quick glance of it from time to time, especially when looking for some patterns. But it's difficult to see through at the first sight with a long table ahead. I would like to ask, can we just get first few rows by each group? Something like DT[1:5, .(X,Y), by=.(Z)]? Here is the ref.code.

library(data.table)
DT <- data.table(X=rep(letters[1:3], 50), Y=rep(LETTERS[1:3],each=50))[,Z:=paste0(X,"-",Y)]

setkey(DT,Z)

DT[,.(X,Y), by=.(Z)]
#        Z X Y
#   1: a-A a A
#   2: a-A a A
#   3: a-A a A
#   4: a-A a A
#   5: a-A a A
#  ---        
# 146: c-C c C
# 147: c-C c C
# 148: c-C c C
# 149: c-C c C
# 150: c-C c C

and I would like to make it this way:

#        Z X Y
#   1: a-A a A
#   2: a-A a A
#   3: a-A a A
#   4: a-A a A
#   5: a-A a A
#   6: b-B a B
#   7: b-B a B
#   8: b-B a B
#   9: b-B a B
#  10: b-B a B
#  11: c-C c C
#  12: c-C c C
#  13: c-C c C
#  14: c-C c C
#  15: c-C c C

just pick out the first few (say,5) rows, so the details won't be folded up. Please advise, Thanks

--------

To express my gratitude to the community, here is my conclusion based on @Frank and many other friends' feedback.

Per the other same topic post, people suggest to generate an index, then produce target table. here are the steps:

make index by .I[] which I'd never heard of.

DT[,.I[1:2], by=Z]

> DT[,.I[1:2], by=Z]
      Z  V1
 1: a-A   1
 2: a-A   2
 3: a-B  18
 4: a-B  19
 5: a-C  35
 6: a-C  36
 7: b-A  51
 8: b-A  52
 9: b-B  68
10: b-B  69
11: b-C  84
12: b-C  85
13: c-A 101
14: c-A 102
15: c-B 117
16: c-B 118
17: c-C 134
18: c-C 135

The V1 is the index! So, I can grab and use it.

Insert the index in to the original DT, and BINGO

That's cool

> DT[DT[,.I[1:2], by=Z]$V1]
    X Y   Z
 1: a A a-A
 2: a A a-A
 3: a B a-B
 4: a B a-B
 5: a C a-C
 6: a C a-C
 7: b A b-A
 8: b A b-A
 9: b B b-B
10: b B b-B
11: b C b-C
12: b C b-C
13: c A c-A
14: c A c-A
15: c B c-B
16: c B c-B
17: c C c-C
18: c C c-C

Thanks, Guys

score 2 · Accepted Answer · answered May 24 '18 at 12:06

2

Would this work?

# adding an index and select on that
DT[ , idx := 1:.N, by = "Z"]
DT[idx <= 5 ]

answered May 24 '18 at 12:06

moooh

459
3
10

4

Fyi, you can do `DT[rowid(Z) <= 5]` too – Frank May 24 '18 at 12:10
1

@Frank Thanks, data.table is so cool. – moooh May 24 '18 at 12:18

finding first few rows in groups within data.table

--------

1 Answers1