Subset first n occurrences of certain value in dataframe

Question

Suppose I have a matrix (or dataframe):

I would like to select only the first three rows that have "3" as their first entry, as follows:

3  4  9
3  9  6
3  1  2

It is clear to me how to pull out all rows that begin with "3" and it is clear how to pull out just the first row that begins with "3."

But in general, how can I extract the first n rows that begin with "3"?

Furthermore, how can I select just the 3rd and 4th appearances, as follows:

3  1  2
3  8  6

score 5 · Accepted Answer · edited May 23 '17 at 12:23

Without the need for an extra package:

mydf[mydf$V1==3,][1:3,]

results in:

When you need the third and fourth row:

mydf[mydf$V1==3,][3:4,]
# or:
mydf[mydf$V1==3,][c(3,4),]

Used data:

mydf <- structure(list(V1 = c(1L, 3L, 3L, 6L, 3L, 4L, 3L, 3L), 
                       V2 = c(5L, 4L, 9L, 9L, 1L, 7L, 8L, 2L), 
                       V3 = c(8L, 9L, 6L, 3L, 2L, 2L, 6L, 7L)), 
                  .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -8L))

Bonus material: besides dplyr, you can do this also very efficiently with data.table (see this answer for speed comparisons on large datasets for the different data.table methods):

setDT(mydf)[V1==3, head(.SD,3)]
# or:
setDT(mydf)[V1==3, .SD[1:3]]

Thank you, Jaap. I appreciate the answer without calling in other packages. — el_dewey, Jan 14 '16 at 17:22

score 2 · Answer 2 · answered Jan 14 '16 at 17:00

You can do something like this with dplyr to extract first three rows of each unique value of that column:

library(dplyr)
df %>% arrange(columnName) %>% group_by(columnName) %>% slice(1:3)

If you want to extract only three rows when the value of that column, you can try:

df %>% filter(columnName == 3) %>% slice(1:3)

If you want specific rows, you can supply to slice as c(3, 4), for example.

akrun · Answer 3 · 2016-01-15T16:34:13.013

1

We could also use subset

head(subset(mydf, V1==3),3)

Update

If we need to extract also one row below the rows where V1==3,

i1 <- with(mydf, V1==3)
mydf[sort(unique(c(which(i1),pmin(which(i1)+1L, nrow(mydf))))),]

edited Jan 15 '16 at 16:34

answered Jan 14 '16 at 17:48

akrun

874,273
37
540
662

1

Thank you for your input. This works perfectly! Now suppose I'd like to extract each row where (ColumnName == 3) AND 1 row underneath each that fit the condition regardless of its contents. – el_dewey Jan 15 '16 at 16:01

Subset first n occurrences of certain value in dataframe

3 Answers3

Update

Linked