2

I have two dataframes a and b.

a can be 2000-3000 rows long with 15 columns.

b is a small dataframe (2 columns with 150 rows).

Below a simplified dataset.

   a <- structure(list(ID = structure(c(1L, 2L, 1L, 3L, 2L, 1L, 3L), .Label = c("ID1", 
    "ID2", "ID3"), class = "factor"), score = structure(c(4L, 5L, 
    3L, 6L, 7L, 1L, 2L), .Label = c("10", "110", "20", "28", "34", 
    "80", "90"), class = "factor"), desc = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L), class = "factor", .Label = "text")), .Names = c("ID", 
    "score", "desc"), row.names = c(NA, -7L), class = "data.frame")

  b <- structure(list(ID = structure(1:3, .Label = c("ID1", "ID2", "ID3"
), class = "factor"), cutoff = structure(1:3, .Label = c("12", 
"46", "54"), class = "factor")), .Names = c("ID", "cutoff"), row.names = c(NA, 
-3L), class = "data.frame")

I would like to filter dataframe a using the scores from dataframe b. For example in dataframe b the ID "ID1" has a cutoff of 12, so i only want to keep ID1 from dataframe a above or equal to 12. I would like to do this for all IDS.

> a
   ID score desc
1 ID1    28 text
2 ID2    34 text
3 ID1    20 text
4 ID3    80 text
5 ID2    90 text
6 ID1    10 text
7 ID3   110 text
> b
   ID cutoff
1 ID1     12
2 ID2     46
3 ID3     54

Given the cutoffs in dataframe b the final dataframe a should remain as follows:

> a
   ID score desc
1 ID1    28 text
2 ID1    20 text
3 ID3    80 text
4 ID2    90 text
5 ID3   110 text
david
  • 805
  • 1
  • 9
  • 21
  • 1
    What have you tried? This is a simple merge and filter problem, see some tutorials. Also, please store numbers as numbers. – zx8754 Feb 13 '18 at 08:47

3 Answers3

3

with base R:

subset(merge(a,b),as.numeric(as.character(score)) > as.numeric(as.character(cutoff)),1:3)
#    ID score desc
# 1 ID1    28 text
# 2 ID1    20 text
# 5 ID2    90 text
# 6 ID3    80 text
# 7 ID3   110 text

Or with dplyr:

library(dplyr)
a %>%
  left_join(b) %>%
  filter(as.numeric(as.character(score)) > as.numeric(as.character(cutoff))) %>%
  select(-cutoff)

#    ID score desc
# 1 ID1    28 text
# 2 ID1    20 text
# 3 ID3    80 text
# 4 ID2    90 text
# 5 ID3   110 text
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
2

Here is a base R option:

df <- merge(a, b, by="ID")
index <- as.numeric(levels(df$score))[df$score] >
    as.numeric(levels(df$cutoff))[df$cutoff]
df[index, -which(names(df) %in% c("cutoff"))]

   ID score desc
1 ID1    28 text
2 ID1    20 text
5 ID2    90 text
6 ID3    80 text
7 ID3   110 text

Demo

Note: A bit of work converting your factor scores and cutoffs to numeric values. Unless you plan on having many repeated values, consider using a numeric type to store this information.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

You can try the following. We first convert the score and cutoff columns to numeric values, since they are now factors. We then subset from the dataframe, using match to find the corresponding cutoff from b for each entry in a.

a$score = as.numeric(as.character(a$score))
b$cutoff= as.numeric(as.character(b$cutoff))
subset(a,score>=b$cutoff[match(a$ID,b$ID)])

Output:

   ID score desc
1 ID1    28 text
3 ID1    20 text
4 ID3    80 text
5 ID2    90 text
7 ID3   110 text

Hope this helps!

Florian
  • 24,425
  • 4
  • 49
  • 80