1

I am trying to iterate through a column in one data frame to filter its values using the values of a column in a separate data frame. I am still learning R and so I'm not sure where I'm going wrong.

This is my latest attempt:

List <- function(df1,df2){ 
     for ( i in df1$Name) {
          for j in (df2$Name) {
               if ( i == j ) {
                   df1[-c(i),]
}
}
}
}

*Update: the above code is now giving me " error in c(x, ):argument 2 is empty "

Given df1 and df2

Column 1 Column 2
First none
Sec row
Column 1 Column 2
First row
Second row
Third none

The output would be

Column 1 Column 2
First row
Reed Merrill
  • 117
  • 1
  • 10
AlexisH
  • 33
  • 4
  • 3
    Welcome to SO, AlexisH! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (you've got this, nice!), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Jul 11 '23 at 20:29
  • 4
    There is almost _certainly_ a more canonical (better, efficient, faster, easier-to-read) method to do what you need in R. Once you [edit] your question to add sample data, I'm confident there will be strong/fast recommendations for what you need. Please be sure to include your expected output, preferably as a manually-generated `data.frame` in the same fashion as your input data is shared. (Note that we likely don't need more than a handful of rows/columns to get the point across.) – r2evans Jul 11 '23 at 20:30
  • 1
    Hi, and welcome to stack overflow! I think I see what you're aiming to do here and I'm sure I can help once I'm sure what you're hoping to do. Is it that you want the output to only include rows from df2 that match the value of column 1 in df1? – Reed Merrill Jul 12 '23 at 04:16
  • @ReedMerrill Yes! That's exactly what I am trying to do. If I can be any clearer in my wording, please let me know. Thank you!! – AlexisH Jul 12 '23 at 13:40
  • Ok, I made a slight edit to your original question and changed the title. I'll provide an answer shortly. – Reed Merrill Jul 12 '23 at 14:40

1 Answers1

1

I think the error in the code example you gave could be coming from a misplaced bracket in your second for statement, which should read for (j in ...) instead of for j in (...).

There are a number of ways to do this, but the simplest is probably just use "base R" subsetting.

First, I'll construct some data for the example:

dt1 <- data.frame(
    v1 = c(4, 5, 9, 14, 7, 1),
    v2 = c(12, 9, 17, 4, 2, 1)
)
dt2 <- data.frame(
    v3 = c(4, 0, 1, 10, 7, 1),
    v4 = c("a", "f", "s", "g", "w", "z")
)

Now, we use subsetting. R lets you subset a data frame by entering the row and column indexes into [] following the name of the data object, with the first space being for the rows, and the second for columns. So df1[1,2] returns the cell in the first row of the second column. If you leave one of these inputs blank, then R assumes you want every row (or column) that matches the other row/column condition provided. With this, we can do more complex subsetting:

dt3 <- dt2[dt1$v1 %in% dt2$v3, ]

Here, %in% is returning values of dt$v1 that are "in" dt2$v3, and since the columns input is blank, it returns all columns of the rows of dt2 that match the row condition. So we're creating dt3, which is a subsetting dt2. This gives you:

> dt3
   v1 v2
1  4 12
5  7  2
6  1  1
Reed Merrill
  • 117
  • 1
  • 10