How to select every 5 other observations in R?

Question

I have a dataset that contains 10 "houses" with energy production for every minute of the day. Like so:

HouseID Time KwH
1       1    X
2       1    X
3       1    X
4       1    X
5       1    X
6       1    X
7       1    X
8       1    X
9       1    X
10      1    X
1       2    X
2       2    X
3       2    X
4       2    X
5       2    X
6       2    X
7       2    X
8       2    X
9       2    X
10      2    X

I would like to delete the rows with houseIDs 6 until 10 so that I would be left with only the observations of houseID 1,2,3,4 and 5.

score 2 · Accepted Answer · edited May 26 '16 at 12:46

You can try

newdf <- df1[!df1$HouseID %in% 6:10,]
#   HouseID Time KwH
#1        1    1   X
#2        2    1   X
#3        3    1   X
#4        4    1   X
#5        5    1   X
#11       1    2   X
#12       2    2   X
#13       3    2   X
#14       4    2   X
#15       5    2   X

data

df1 <- structure(list(HouseID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
      10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), Time = c(1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
       2L, 2L), KwH = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
       1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "X", 
       class = "factor")), .Names = c("HouseID", "Time", "KwH"), 
       class = "data.frame", row.names = c(NA, -20L))

score -1 · Answer 2 · answered May 26 '16 at 16:11

-1

Assuming df is the name of your data frame then just use the following:

df2 <- subset(df, df$HouseID==1:5)

answered May 26 '16 at 16:11

Amanda R.

287
1
2
17

1

actually, that won't work the way you expect. `%in%` is what you need for making this kind of comparison. And you don't need to repeat `df$`: `subset(df, HouseID %in% 1:5)`. Not very different from @RHertel's, but marginally (since you're including rather than excluding, and using `subset` rather than `[`-indexing) ... – Ben Bolker May 26 '16 at 16:16
1

This will work if you are very lucky: it relies on (1) the house IDs being in order in the data frame and (2) the total number of houses being an even multiple of the length of the subset and repeating regularly up to that number. OP's sample data meets those criteria, but it's not a general solution. – Gregor Thomas May 26 '16 at 16:18

How to select every 5 other observations in R?

2 Answers2