filter a dataframe with another dataframe

Question

Hey all I have a data frame df1 and I want to filter this data frame with another data frame df2.

I want to filter the columns of df1 with the rows of df2 so that I get a only the columns of df1 that are rows df2.

I tried the following code but an error occurred.

df1 %>% filter(colnames(df1) %in% rownames(df2))

The error

Error in `filter()`:
i In argument: `colnames(df1) %in% rownames(df2)`.
Caused by error:
! `..1` must be of size 46 or 1, not size 163.
Run `rlang::last_trace()` to see where the error occurred.

df1 is 46 * 163 and df2 151 * 6 big.

What am I doing wrong?

edit here is some dummy data

df1
   sp1 sp2 sp3 sp4
1   0   1   1   2
2   1   1   1   0
3   0   2   0   1
4   0   1   1   0
5   1   1   0   1

df2
    x1  x2  x3  x4
sp1 10  0.1  1   a
sp2 11  0.5  2   b
sp3 12  0.1  3   c

and my output should be:

   sp1 sp2 sp3 
1   0   1   1   
2   1   1   1   
3   0   2   0   
4   0   1   1   
5   1   1   0

Hi Mike, it's difficult to provide help without a reproducible example. Could you edit your question to make the error reproducible? For some tips on how to do that, see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — jpsmith, Aug 01 '23 at 17:26
`filter` is used for keeping only certain rows. `select` is used for keeping only certain columns. I think you're looking for `df1 |> select(all_of(rownames(df2)))`, but as jpsmith says without a reproducible example it's very hard to tell. — Gregor Thomas, Aug 01 '23 at 17:28
`filter()` can only be used to subset rows. It cannot subset columns. For that you would use `select()`. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Aug 01 '23 at 17:28
@GregorThomas might be better to use `any_of` since not all `rownames` may be in `colnames`. — LMc, Aug 01 '23 at 17:34

score 1 · Accepted Answer · answered Aug 01 '23 at 17:32

1

df1 %>% select(colnames(df1)[colnames(df1) %in% rownames(df2)])

answered Aug 01 '23 at 17:32

zchmielewska

93
6

jpsmith · Answer 2 · 2023-08-01T18:18:43.213

In base R you can simply do:

df1[colnames(df1) %in% rownames(df2)]

#  sp1 sp2 sp3
#1   0   1   1
#2   1   1   1
#3   0   2   0
#4   0   1   1
#5   1   1   0

In dplyr you can use base::intersect to identify the columns with values in row names, and dplyr::select() to only get those column names.

You could take this approach:

df1 %>% 
  select(intersect(rownames(df2), colnames(.)))

#  sp1 sp2 sp3
#1   0   1   1
#2   1   1   1
#3   0   2   0
#4   0   1   1
#5   1   1   0

Or as @GregorThomas and @LMc jointly state in the comments, a slightly more elegant approach using any_of():

df1 %>% 
  select(any_of(rownames(df2)))

#  sp1 sp2 sp3
#1   0   1   1
#2   1   1   1
#3   0   2   0
#4   0   1   1
#5   1   1   0

filter a dataframe with another dataframe

2 Answers2