0

I have three data frames, df1, df2 and df3 consisting of 42, 103 and 414 columns respectively.

I want to see which columns all three data frames have in common / share so that I can establish how to join them.

I am aware of intersect() being able to determine whether two objects share any columns, but as this is limited two only two objects, a good workaround is to use Reduce(intersect, list(a, b, c)) in order to be able to compare more than two.

As such, when I try this, I receive an error regarding the number of columns:

> Reduce(intersect, list(colnames(df1), colnames(df2), colnames(df3)))
Error: not compatible: 
- different number of columns: 42 vs 103

Is there a workaround or alternative approach to this problem?

I have come across a number of similar questions, but nothing that seems to match my problem more closely, so I decided to pose this question.

Mus
  • 7,290
  • 24
  • 86
  • 130
  • Can you clarify what you mean by: "which columns all three have in common"? Do you mean the set of column names the dfs share or do you mean to find columns (with different names) that have shared values. Also, a [MWE would be helpful.](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – emilliman5 May 25 '21 at 15:26
  • I have updated the question (I changed my example code too because I forgot to include `colnames` before each `dfX` object). I want to see which columns all three data frames share. – Mus May 25 '21 at 15:35

2 Answers2

3

Before using Reduce, get all the names of the columns in the dataframes like

dfList <- list(df1, df2, df3)
dfColList <- lapply(dfList,names)
commonCols <- Reduce(intersect,dfColList)
Jonas
  • 1,760
  • 1
  • 3
  • 12
1

To my dear friend @Anil Goyal who taught me how to use them:

You can also use the following solution. First we use map function to create a list whose elements are column names of corresponding data frames. Then we use reduce function to apply the function intersect on them two by two. As you might have noticed .x argument in intersect is the reduced (accumulated) value and .y represents the current value.

library(purrr)

lst %>%
  map(~ .x %>% 
        names()) %>%
  reduce(~ intersect(.x, .y))

[1] "x"

Sample Data

list(df1 = structure(list(x = 1:3, y = 4:6), class = "data.frame", row.names = c(NA, 
-3L)), df2 = structure(list(x = 1:3, y2 = 7:9), class = "data.frame", row.names = c(NA, 
-3L)), df3 = structure(list(x = 1:3, y3 = 10:12), class = "data.frame", row.names = c(NA, 
-3L)))
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41