7

In R, I am trying to subset the data.frame named Data by using element stored in a list.

Data

Data <- read.table(text = "  Data_x  Data_y  Column_X 
                                -34      12       A
                                -36      20       D
                                -36      12       E
                                -34      18       F
                                -34      10       B
                                -35      24       A
                                -35      16       B
                                -33      22       B
                                -33      14       C
                                -35      22       D", header = T)

Code

variableData <- list("A", "B")
subsetData_1 <- subset(Data, Column_X == variableData[1])
subsetData_2 <- subset(Data, Column_X == variableData[2])
subsetData <- rbind(subsetData_1, subsetData_2)

Problems

  • First, the elements in the list can be more than two and is not fixed. Can even have more than 100 elements.
  • Second, I want to keep only one data.frame which will store all the subset data extracted using all the elements in list. If there are more elements, lets say 100, then I don't want to repeat subset() for each of the elements.

Is there a better way to approach this than the code above? As my approach is not good enough and will take performance hit.

Any suggestion will be helpful, thanks.

loki
  • 9,816
  • 7
  • 56
  • 82
Chetan Arvind Patil
  • 854
  • 1
  • 11
  • 31
  • 1
    You might want to read the warning in `?subset`. It is not meant to be used when writing programs. – Frank Jul 27 '17 at 20:32
  • @Frank - You scared me: `?subset` says `subset can have unanticipated consequences`. Can you please point me to an example? – Chetan Arvind Patil Jul 27 '17 at 20:37
  • Sure. Here's a Q&A on it https://stackoverflow.com/q/9860090/ Tbh, I did not read it carefully and simply remembered to avoid coding with `subset`, so I'm not really an authority on it. – Frank Jul 27 '17 at 20:41

2 Answers2

9

%in% should do the trick:

subset(Data, Column_X %in% variableData)

You can also use dplyr and filter:

Data %>% filter(Column_X %in% variableData)
Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
  • 1
    This might require variableData be a vector (instead of the list the OP used). I think the OP should use a vector, anyways. – Frank Jul 27 '17 at 20:30
  • 1
    @Frank - That question even I had whether to use `list()` or `vector()`. For my problem, I should stick to `variableData <- c("A", "B")` – Chetan Arvind Patil Jul 27 '17 at 20:40
5

Classic lapply.

x <- lapply(variableData, function(x){subset(Data, Column_X == x)})
x
# [[1]]
# Data_x Data_y Column_X
# 1    -34     12        A
# 6    -35     24        A
# 
# [[2]]
# Data_x Data_y Column_X
# 5    -34     10        B
# 7    -35     16        B
# 8    -33     22        B

it returns a list of all the subsets. To rbind all these list elements just

do.call(rbind, x)
#   Data_x Data_y Column_X
# 1    -34     12        A
# 6    -35     24        A
# 5    -34     10        B
# 7    -35     16        B
# 8    -33     22        B

however, as @Frank pointed out, you could use basic subsetting in your code:

Data[Data$Column_X %in% variableData,]
#   Data_x Data_y Column_X
# 1    -34     12        A
# 5    -34     10        B
# 6    -35     24        A
# 7    -35     16        B
# 8    -33     22        B

"Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences." (?subset)

Furthermore, thus the order of your rows will be kept.

loki
  • 9,816
  • 7
  • 56
  • 82