2

Is there a function in R that intersects a list of multiple data frames with different number of columns and returns a list of multiple data frames having matched columns?

As an example I have the following list:

ll <- list(structure(list(V1 = c(8L, 2L, 7L), V2 = c(1L, 9L, 3L), V3 = 4:6), .Names = c("V1", "V2", "V3"), row.names = c(NA, -3L), class = "data.frame"), structure(list(V1 = c(1L, 3L, 2L), V2 = c(5L, 4L, 6L)), .Names = c("V1", "V2"), row.names = c(NA, -3L), class = "data.frame"))

> ll
[[1]]
  V1 V2 V3
1  8  1  4
2  2  9  5
3  7  3  6

[[2]]
  V1 V2
1  1  5
2  3  4
3  2  6

The resulting list should give:

> new.ll
[[1]]
  V1 V2
1  8  1
2  2  9
3  7  3

[[2]]
  V1 V2
1  1  5
2  3  4
3  2  6

Thanks.

alaj
  • 187
  • 1
  • 10
  • The matching is based only on column **names**? – talat Aug 25 '16 at 11:08
  • Does matching means that all have the same number of columns beginning from the first column? – Phann Aug 25 '16 at 11:10
  • Yes, matching is based on column names, and yes, matching means all have the same number of columns that are common. – alaj Aug 25 '16 at 11:13
  • 1
    @akrun: This is a question on how to manipulate a list of uneven data frames to create a new list where each data frame will have the same set of columns. Intersect is used to find commons but there is more to the question than what your reference case has to offer. – alaj Aug 25 '16 at 15:09

2 Answers2

4

There should be a better alternative for this. However, right now I can think of only this.

mincol <- Reduce(intersect, lapply(ll, colnames))
lapply(ll, function(x) x[mincol])

#[[1]]
#  V1 V2
#1  8  1
#2  2  9
#3  7  3

#[[2]]
#  V1 V2
#1  1  5
#2  3  4
#3  2  6

Finding out the common column names using intersect and then selecting only those column names across all the dataframes in the list.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

A solution not based on the column names, but on the number of columns (beginning with the first column). All data.frames are reduced to the minimum size in columns and rows present in any data.frame:

ll_new <- lapply(ll, function(y) y[1:min(sapply(ll, function(x) dim(x)[1])), #min number of rows
                                   1:min(sapply(ll, function(x) dim(x)[2])]) #min number of cols
Phann
  • 1,283
  • 16
  • 25