Select rows based on column value in a list of dataframes

Question

I have a list of dataframes and each one looks like this:

df1:

Name	X	Y
AAA	10	5
AAA	20	10
AAA	30	15
AAA	40	20

df2:

Name	X	Y
BBB	20	10
BBB	30	15
BBB	40	20

df3:

Name	X	Y
CCC	10	5
CCC	20	10
CCC	30	15
CCC	40	20

And I have another dataframe like this:

ID	Name
1	AAA
2	CCC
3	FFF

I would like to extract the dataframes from the list that have the same names as the last dataframe. So, in this case, I would get only df1 and df3.

Hello, you should make a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). This means using some data available to everyone (and that we can just copy-paste in R) and code that everybody can run. You will greatly improve your chances of having an answer with this. — bretauv, Feb 20 '23 at 16:36

score 4 · Accepted Answer · answered Feb 20 '23 at 16:40

You can do this in base R using lapply and indexing. Below, the unlist(lapply(ll, function(x) any(x$Name %in% mtch$Name))) tests each nested data frame to see if the name matches names in the provided key, and returns a boolean vector that you can index on.

ll <- list(df1, df2, df3)

ll[unlist(lapply(ll, function(x) any(x$Name %in% mtch$Name)))]

output:

[[1]]
  Name  X  Y
1  AAA 10  5
2  AAA 20 10
3  AAA 30 15
4  AAA 40 20

[[2]]
  Name  X  Y
1  CCC 10  5
2  CCC 20 10
3  CCC 30 15
4  CCC 40 20

Data:

df1  <- read.table(text = "Name X   Y
AAA 10  5
AAA 20  10
AAA 30  15
AAA 40  20", h = T)

df2  <- read.table(text = "Name X   Y
BBB 20  10
BBB 30  15
BBB 40  20", h = T)

df3  <- read.table(text = "Name X   Y
CCC 10  5
CCC 20  10
CCC 30  15
CCC 40  20", h = T)

mtch <- read.table(text = "ID   Name
       1    AAA
       2    CCC
       3    FFF", h = T)

score 4 · Answer 2 · answered Feb 20 '23 at 16:51

4

Using keep

library(purrr)
keep(list(df1, df2, df3), ~ unique(.x[[1]]) %in% mtch$Name)

-output

[[1]]
  Name  X  Y
1  AAA 10  5
2  AAA 20 10
3  AAA 30 15
4  AAA 40 20

[[2]]
  Name  X  Y
1  CCC 10  5
2  CCC 20 10
3  CCC 30 15
4  CCC 40 20

answered Feb 20 '23 at 16:51

akrun

874,273
37
540
662

score 3 · Answer 3 · answered Feb 20 '23 at 16:41

With Filter:

l <- list(df1, df2, df3)
Filter(\(x) unique(x[[1]]) %in% mtch$Name, l)

output (data from @jpsmith)

[[1]]
  Name  X  Y
1  AAA 10  5
2  AAA 20 10
3  AAA 30 15
4  AAA 40 20

[[2]]
  Name  X  Y
1  CCC 10  5
2  CCC 20 10
3  CCC 30 15
4  CCC 40 20

score 3 · Answer 4 · answered Feb 20 '23 at 16:57

Here's another tidyverse solution for this:

library(tidyverse)

bind_rows(mylist, .id = "df") %>% 
  filter(Name %in% mymatch$Name) %>% 
  group_split(., df, .keep = FALSE)

#> <list_of<
#>   tbl_df<
#>     Name: character
#>     X   : integer
#>     Y   : integer
#>   >
#> >[2]>
#> [[1]]
#> # A tibble: 4 x 3
#>   Name      X     Y
#>   <chr> <int> <int>
#> 1 AAA      10     5
#> 2 AAA      20    10
#> 3 AAA      30    15
#> 4 AAA      40    20
#> 
#> [[2]]
#> # A tibble: 4 x 3
#>   Name      X     Y
#>   <chr> <int> <int>
#> 1 CCC      10     5
#> 2 CCC      20    10
#> 3 CCC      30    15
#> 4 CCC      40    20

Data:

mylist <- list(df1 = read.table(text = "Name X   Y
                                        AAA 10  5
                                        AAA 20  10
                                        AAA 30  15
                                        AAA 40  20", h = T),
               df2  = read.table(text = "Name X   Y
                                        BBB 20  10
                                        BBB 30  15
                                        BBB 40  20", h = T),
               df3  = read.table(text = "Name X   Y
                                        CCC 10  5
                                        CCC 20  10
                                        CCC 30  15
                                        CCC 40  20", h = T))

mymatch <- read.table(text = "ID   Name
                              1    AAA
                              2    CCC
                              3    FFF", h = T)

Select rows based on column value in a list of dataframes

4 Answers4

Data: