Subset Data Based On Elements In List

Question

In R, I am trying to subset the data.frame named Data by using element stored in a list.

Data

Data <- read.table(text = "  Data_x  Data_y  Column_X 
                                -34      12       A
                                -36      20       D
                                -36      12       E
                                -34      18       F
                                -34      10       B
                                -35      24       A
                                -35      16       B
                                -33      22       B
                                -33      14       C
                                -35      22       D", header = T)

Code

variableData <- list("A", "B")
subsetData_1 <- subset(Data, Column_X == variableData[1])
subsetData_2 <- subset(Data, Column_X == variableData[2])
subsetData <- rbind(subsetData_1, subsetData_2)

Problems

First, the elements in the list can be more than two and is not fixed. Can even have more than 100 elements.
Second, I want to keep only one data.frame which will store all the subset data extracted using all the elements in list. If there are more elements, lets say 100, then I don't want to repeat subset() for each of the elements.

Is there a better way to approach this than the code above? As my approach is not good enough and will take performance hit.

Any suggestion will be helpful, thanks.

You might want to read the warning in `?subset`. It is not meant to be used when writing programs. — Frank, Jul 27 '17 at 20:32
@Frank - You scared me: `?subset` says `subset can have unanticipated consequences`. Can you please point me to an example? — Chetan Arvind Patil, Jul 27 '17 at 20:37
Sure. Here's a Q&A on it https://stackoverflow.com/q/9860090/ Tbh, I did not read it carefully and simply remembered to avoid coding with `subset`, so I'm not really an authority on it. — Frank, Jul 27 '17 at 20:41

Alper t. Turker · Answer 1 · 2017-07-27T20:48:59.590

9

%in% should do the trick:

subset(Data, Column_X %in% variableData)

You can also use dplyr and filter:

Data %>% filter(Column_X %in% variableData)

edited Jul 27 '17 at 20:48

answered Jul 27 '17 at 20:29

Alper t. Turker

34,230
9
83
115

1

This might require variableData be a vector (instead of the list the OP used). I think the OP should use a vector, anyways. – Frank Jul 27 '17 at 20:30
1

@Frank - That question even I had whether to use `list()` or `vector()`. For my problem, I should stick to `variableData <- c("A", "B")` – Chetan Arvind Patil Jul 27 '17 at 20:40

loki · Accepted Answer · 2017-07-27T20:42:21.123

Classic lapply.

x <- lapply(variableData, function(x){subset(Data, Column_X == x)})
x
# [[1]]
# Data_x Data_y Column_X
# 1    -34     12        A
# 6    -35     24        A
# 
# [[2]]
# Data_x Data_y Column_X
# 5    -34     10        B
# 7    -35     16        B
# 8    -33     22        B

it returns a list of all the subsets. To rbind all these list elements just

do.call(rbind, x)
#   Data_x Data_y Column_X
# 1    -34     12        A
# 6    -35     24        A
# 5    -34     10        B
# 7    -35     16        B
# 8    -33     22        B

however, as @Frank pointed out, you could use basic subsetting in your code:

Data[Data$Column_X %in% variableData,]
#   Data_x Data_y Column_X
# 1    -34     12        A
# 5    -34     10        B
# 6    -35     24        A
# 7    -35     16        B
# 8    -33     22        B

"Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences." (?subset)

Furthermore, thus the order of your rows will be kept.

Subset Data Based On Elements In List

2 Answers2

Linked