-5

I have a dataset that contains 10 "houses" with energy production for every minute of the day. Like so:

HouseID Time KwH
1       1    X
2       1    X
3       1    X
4       1    X
5       1    X
6       1    X
7       1    X
8       1    X
9       1    X
10      1    X
1       2    X
2       2    X
3       2    X
4       2    X
5       2    X
6       2    X
7       2    X
8       2    X
9       2    X
10      2    X

I would like to delete the rows with houseIDs 6 until 10 so that I would be left with only the observations of houseID 1,2,3,4 and 5.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
Niels
  • 35
  • 3

2 Answers2

2

You can try

newdf <- df1[!df1$HouseID %in% 6:10,]
#   HouseID Time KwH
#1        1    1   X
#2        2    1   X
#3        3    1   X
#4        4    1   X
#5        5    1   X
#11       1    2   X
#12       2    2   X
#13       3    2   X
#14       4    2   X
#15       5    2   X

data

df1 <- structure(list(HouseID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
      10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), Time = c(1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
       2L, 2L), KwH = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
       1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "X", 
       class = "factor")), .Names = c("HouseID", "Time", "KwH"), 
       class = "data.frame", row.names = c(NA, -20L))
akrun
  • 874,273
  • 37
  • 540
  • 662
RHertel
  • 23,412
  • 5
  • 38
  • 64
-1

Assuming df is the name of your data frame then just use the following:

df2 <- subset(df, df$HouseID==1:5)
Amanda R.
  • 287
  • 1
  • 2
  • 17
  • 1
    actually, that won't work the way you expect. `%in%` is what you need for making this kind of comparison. And you don't need to repeat `df$`: `subset(df, HouseID %in% 1:5)`. Not very different from @RHertel's, but marginally (since you're including rather than excluding, and using `subset` rather than `[`-indexing) ... – Ben Bolker May 26 '16 at 16:16
  • 1
    This will work if you are very lucky: it relies on (1) the house IDs being in order in the data frame and (2) the total number of houses being an even multiple of the length of the subset and repeating regularly up to that number. OP's sample data meets those criteria, but it's not a general solution. – Gregor Thomas May 26 '16 at 16:18