0

I have a data frame with 26 columns and 1000 rows. I have a list of 20 values. I'd like to select only the rows in the data frame that contain any (one or more) of the values on my list.

I have tried subset, and subset + filter functions. Here is the list of values:

dx.codes <- c(4140 , 4111 , 4118 , 41181 , 41189 , 412 , 4130 , 4131 , 4139 , 4140 , 41400 , 41401 , 41406 , 4142 , 4143 , 4144 , 4148 , 4149 , "V4581", "V4582")

df <- subset(sample.df, subset.df[1:1000, ] %in% dx.codes)

That subset returns a new data frame, but without any observations. Looking at the initial data frame I know there are rows containing those values, however I can't get them to show up in the new data frame.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. (We don't need all 1000 rows or 20 columns to test solutions, but something that reasonably approximates your data would be very helpful). – MrFlick Oct 16 '19 at 19:12

2 Answers2

1

Assuming these 20 values can be found in any of the 26 columns, you could use the following code:

library(tidyverse)

df %>%
  filter_all(any_vars(. %in% dx.codes))
Matt
  • 81
  • 2
  • 4
0

Using base R you could use sapply to check for each code for each cell of the dataframe and then use rowSums to create your index:

df1 <- as.data.frame(matrix(sample(1:52000, 26000), nrow = 1000), stringsAsFactors = F)

df1[rowSums(sapply(df1, `%in%`, dx.codes)) > 0,]
Andrew
  • 5,028
  • 2
  • 11
  • 21