R:How to intersect list of dataframes and specifc column

Question

I am trying to find all matching values in a specific column, in a list of data.frames. However, I keep getting a returned value of character(0).

I have tried the following: Simple subset (very time consuming) -> e.g. dat[[i]][[i]] lapply w/ Reduce and intersect (as seen here

LocA<-data.frame(obs.date=c("2018-01-10","2018-01-14","2018-01-20),
obs.count=c(2,0,1))
LocB<-data.frame(obs.date=c("2018-01-09","2018-01-14","2018-01-20),
obs.count=c(0,3,5))
LocC<-data.frame(obs.date=c("2018-01-12","2018-01-14","2018-01-19"),
obs.count=c(2,0,1))
LocD<-data.frame(obs.date=c("2018-01-11","2018-01-16","2018-01-21"),
obs.count=c(2,0,1))

dfList<-list(LocA,LocB,LocC,LocD)

##List of all dates 

lapply(dfList,'[[',1)
[1]"2018-01-10" "2018-01-14" "2018-01-20" "2018-01-09"...

Attempts (failure)

>Reduce(intersect,lapply(dfList,'[[',1))
character (0)

I expect the output of this function to be an output identifying the data.frames that share a common date.

*Extra smiles if someone know how to identify shared dates and mutate in to a single data frame where..Col1 = dataframe name, Col2=obs.date,Col3 = obs.count

score 1 · Accepted Answer · answered Jul 31 '19 at 10:55

1

You can first merge all the data frames so you only have one:

a <- Reduce(function(x, y) merge(x, y, all = TRUE), dfList)

Or you can merge them like this:

a <-rbind(LocA,LocB,LocC,LocD)

Afterwards, you can extract all the duplicates:

b <- a[duplicated(a$obs.date), ]

Or if you want to keep all the unique ones and keep the duplicates:

c <- a[!duplicated(a$obs.date), ]

answered Jul 31 '19 at 10:55

Elias

726
8
20

Thank you! This is exactly what I was looking for. Any idea if I can format the returned dataframe to list the same info, but where each column is a location? Like... date | LocA | LocB | LocC ... 01-01-2018 2 0 1 I am would like to run Poisson against the counts to see relations (if any) – Jul 31 '19 at 13:30
Nice to hear that I could help! No, sadly I don't know a handy solution to this. But you can try and add a variable called Location like this `Location = c("D", "D", "D")`. Then you can see the Locations. And then there is definitely a way to formate this into the data frame you want :) – Elias Jul 31 '19 at 15:06
`LocA<-data.frame(obs.date=c("2018-01-10","2018-01-14","2018-01-20"), obs.count=c(2,0,1), Location = c("A", "A", "A")) LocB<-data.frame(obs.date=c("2018-01-09","2018-01-14","2018-01-20"), obs.count=c(0,3,5), Location = c("B", "B", "B")) LocC<-data.frame(obs.date=c("2018-01-12","2018-01-14","2018-01-19"), obs.count=c(2,0,1), Location = c("C", "C", "C")) LocD<-data.frame(obs.date=c("2018-01-11","2018-01-16","2018-01-21"), obs.count=c(2,0,1), Location = c("D", "D", "D"))` – Elias Jul 31 '19 at 15:19

score 0 · Answer 2 · answered Jul 31 '19 at 10:44

If by "intersect" you mean doing an "inner join" or "merging" with a specific column as key, then -- you want to use dplyr::inner_join or merge.

First, between two data.frames:

library(dplyr)
inner_join(LocA, LocB, by='obs.date')
# 2 rows
inner_join(LocC, LocD, by='obs.date')
# zero rows

So, not infinite merging.

Stack, then count

We'll combine the data first, then count the occurences. Notice the use of the .id-argument to track where the row originated.

library(dplyr)
bind_rows(dfList, .id = 'id') %>%
  add_count(obs.date) %>% 
  filter(n > 1)
# A tibble: 5 x 4
  id    obs.date   obs.count     n
  <chr> <chr>          <dbl> <int>
1 1     2018-01-14         0     3
2 1     2018-01-20         1     2
3 2     2018-01-14         3     3
4 2     2018-01-20         5     2
5 3     2018-01-14         0     3

R:How to intersect list of dataframes and specifc column

2 Answers2

Stack, then count