How to get the rows of a dataframe together (selcted row bottom up)

Question

I explain my question with an example. For example, if I am in the row number 5 (following table), how can I get the rows before 5 that have the same P_P value. The point is the row index of the selected rows should be sequential. For example in the case of following table, I need to get only rows 3 and 4 (since between the row 1 and the rest of the rows there is row number 2 with different P_P.) FYI, I could use for loop to do it but I want to avoid it.

Thanks

ID   Contest   P_P   Time
1      UMA      A    2015
2      DOIS     B    2016
3      DOIS     A    2016
4      UMA      A    2017
5      DOIS     A    2017

http://stackoverflow.com/questions/41535081/increase-counter-upon-change-of-value http://stackoverflow.com/questions/29661269/increment-by-1-for-every-change-in-column-in-r ... and then subsetting. — jogo, Mar 29 '17 at 12:16

score 3 · Accepted Answer · answered Mar 29 '17 at 11:58

You could do this in base R:

rw <- 5
df[(max(which(!(df[1:(rw-1),]$P_P==df[rw,]$P_P)))+1):(rw-1),]

# ID Contest P_P Time
#3  3    DOIS   A 2016
#4  4     UMA   A 2017

The idea is to first find matches between 1 through rw-1 (i.e., df[1:(rw-1),]$P_P==df[rw,]$P_P) and then find the last non-match (i.e., FALSE) which is captured by max(which(!...)).

df <- structure(list(ID = 1:5, Contest = structure(c(2L, 1L, 1L, 2L, 
1L), .Label = c("DOIS", "UMA"), class = "factor"), P_P = structure(c(1L, 
2L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"), Time = c(2015L, 
2016L, 2016L, 2017L, 2017L)), .Names = c("ID", "Contest", "P_P", 
"Time"), class = "data.frame", row.names = c(NA, -5L))

score 2 · Answer 2 · answered Mar 29 '17 at 12:37

row <- 5

## get the subset with P_P = p-p of row 
subset <- subset(df[(row-1):1,], P_P == df[row,]$P_P)

## check the difference 
a <- which(abs(diff(subset$ID)) != 1)


subset[1:a[1],]

# ID Contest P_P Time
# 4  4     UMA   A 2017
# 3  3    DOIS   A 2016

jogo · Answer 3 · 2017-03-29T12:52:46.083

Here is a solution with rev() and rle():

tail(d, rle(rev(as.integer(d$P_P)))$lengths[1]) # with last row
head(tail(d, rle(rev(as.integer(d$P_P)))$lengths[1]), -1) # without last row

Another solution:
We can use inverse.rle() to build a grouping variable:

r <- rle(as.character(d$P_P)) # also possible: r <- rle(as.integer(d$P_P))
r$values <- seq(r$values)
d$group <- inverse.rle(r)
i <- 5
d[d$group==d$group[i],]

result:

#  ID Contest P_P Time group
#3  3    DOIS   A 2016     3
#4  4     UMA   A 2017     3
#5  5    DOIS   A 2017     3

If you want a result without the row i:

subset(d[-i,], group==d$group[i])

data:

d <- structure(list(ID = 1:5, Contest = structure(c(2L, 1L, 1L, 2L, 
1L), .Label = c("DOIS", "UMA"), class = "factor"), P_P = structure(c(1L, 
2L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"), Time = c(2015L, 
2016L, 2016L, 2017L, 2017L)), .Names = c("ID", "Contest", "P_P", 
"Time"), class = "data.frame", row.names = c(NA, -5L))

How to get the rows of a dataframe together (selcted row bottom up)

3 Answers3