0

I want to subset a dataset with several levels of a categorical variable in Rstudio.

With the function "subset" I am able to do it with just one level

new_df<-subset(df, df$cat.var=="level.1")

How do I subset with more than one levels?

Phil
  • 7,287
  • 3
  • 36
  • 66
  • You can use the `%in%` membership operator with a list of the factor levels you want to retain rows for. See my full answer below. – Anil Feb 09 '22 at 10:32

1 Answers1

1

You can use %in%.

This is a membership operator that you can use with a vector of the factor levels of cat.var which you would like to retain rows for.

new_df <- subset(df, df$cat.var %in% c("level.1", "level.2"))

For example

df <- data.frame(fct = rep(letters[1:3], times = 2), nums = 1:6)

df

# This is our example data.frame
#   fct nums
# 1   a    1
# 2   b    2
# 3   c    3
# 4   a    4
# 5   b    5
# 6   c    6

subset(df, df$fct %in% c("a", "b"))

# Subsetting on a factor using %in% returns the following output:
#   fct nums
# 1   a    1
# 2   b    2
# 4   a    4
# 5   b    5

Note: Another option is to use the filter function from dplyr as follows

library(dplyr)

filter(df, fct %in% c("a", "b"))

This returns the same filtered (subsetted) dataframe.

Anil
  • 1,097
  • 7
  • 20