Filter dataframe with values from second dataframe

Question

I have two dataframes a and b.

a can be 2000-3000 rows long with 15 columns.

b is a small dataframe (2 columns with 150 rows).

Below a simplified dataset.

   a <- structure(list(ID = structure(c(1L, 2L, 1L, 3L, 2L, 1L, 3L), .Label = c("ID1", 
    "ID2", "ID3"), class = "factor"), score = structure(c(4L, 5L, 
    3L, 6L, 7L, 1L, 2L), .Label = c("10", "110", "20", "28", "34", 
    "80", "90"), class = "factor"), desc = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L), class = "factor", .Label = "text")), .Names = c("ID", 
    "score", "desc"), row.names = c(NA, -7L), class = "data.frame")

  b <- structure(list(ID = structure(1:3, .Label = c("ID1", "ID2", "ID3"
), class = "factor"), cutoff = structure(1:3, .Label = c("12", 
"46", "54"), class = "factor")), .Names = c("ID", "cutoff"), row.names = c(NA, 
-3L), class = "data.frame")

I would like to filter dataframe a using the scores from dataframe b. For example in dataframe b the ID "ID1" has a cutoff of 12, so i only want to keep ID1 from dataframe a above or equal to 12. I would like to do this for all IDS.

> a
   ID score desc
1 ID1    28 text
2 ID2    34 text
3 ID1    20 text
4 ID3    80 text
5 ID2    90 text
6 ID1    10 text
7 ID3   110 text
> b
   ID cutoff
1 ID1     12
2 ID2     46
3 ID3     54

Given the cutoffs in dataframe b the final dataframe a should remain as follows:

> a
   ID score desc
1 ID1    28 text
2 ID1    20 text
3 ID3    80 text
4 ID2    90 text
5 ID3   110 text

What have you tried? This is a simple merge and filter problem, see some tutorials. Also, please store numbers as numbers. — zx8754, Feb 13 '18 at 08:47

moodymudskipper · Answer 1 · 2018-02-13T09:04:19.983

3

with base R:

subset(merge(a,b),as.numeric(as.character(score)) > as.numeric(as.character(cutoff)),1:3)
#    ID score desc
# 1 ID1    28 text
# 2 ID1    20 text
# 5 ID2    90 text
# 6 ID3    80 text
# 7 ID3   110 text

Or with dplyr:

library(dplyr)
a %>%
  left_join(b) %>%
  filter(as.numeric(as.character(score)) > as.numeric(as.character(cutoff))) %>%
  select(-cutoff)

#    ID score desc
# 1 ID1    28 text
# 2 ID1    20 text
# 3 ID3    80 text
# 4 ID2    90 text
# 5 ID3   110 text

edited Feb 13 '18 at 09:04

answered Feb 13 '18 at 08:37

moodymudskipper

46,417
11
121
167

fixed thanks, inserted `as.character` in there – moodymudskipper Feb 13 '18 at 09:01

Tim Biegeleisen · Answer 2 · 2018-02-13T08:48:08.563

Here is a base R option:

df <- merge(a, b, by="ID")
index <- as.numeric(levels(df$score))[df$score] >
    as.numeric(levels(df$cutoff))[df$cutoff]
df[index, -which(names(df) %in% c("cutoff"))]

   ID score desc
1 ID1    28 text
2 ID1    20 text
5 ID2    90 text
6 ID3    80 text
7 ID3   110 text

Demo

Note: A bit of work converting your factor scores and cutoffs to numeric values. Unless you plan on having many repeated values, consider using a numeric type to store this information.

score 1 · Accepted Answer · answered Feb 13 '18 at 08:34

You can try the following. We first convert the score and cutoff columns to numeric values, since they are now factors. We then subset from the dataframe, using match to find the corresponding cutoff from b for each entry in a.

a$score = as.numeric(as.character(a$score))
b$cutoff= as.numeric(as.character(b$cutoff))
subset(a,score>=b$cutoff[match(a$ID,b$ID)])

Output:

   ID score desc
1 ID1    28 text
3 ID1    20 text
4 ID3    80 text
5 ID2    90 text
7 ID3   110 text

Hope this helps!

Filter dataframe with values from second dataframe

3 Answers3

Demo