0

I am a beginner in R and trying to solve the following problem. I have 30 datasets for which I need to apply the same calculations. The datasets contain names and I have to find the names that are included in all columns within each dataset. All datasets have 4 columns. For simplicity reasons, lets assume that I have the following 3 datasets:

df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"), 
x2=c("Ben","Paul","Tim", "Linda", "Alex", "MJ"), 
x3=c("Tomas","Alex","Ben", "Paul", "MJ", "Tim"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Ben"))

df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"), 
x2=c("Ben","Paul","Tim", "Linda", "Tyler", "MJ"), 
x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))

df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"), 
x2=c("Lisa","Paul","Tim", "Linda", "Tyler", "MJ"), 
x3=c("Tyler","Alex","Ben", "Tyler", "MJ", "Lisa"), 
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))

My idea was that I first extract every unique name in each dataset (as they differ and sometimes occur several times in a dataset) and then look whether these unique names are included in every column of each dataset. Therefore, I already combined all datasets in a list of datasets using:

df_list<-list(df1,df2,df3)

Then I extracted the unique names in each dataset using:

unique_list <- lapply(df_list,  function(x) {
  as.vector(unique(unlist(x)))
})

Here is where I get stuck. I do not know how to compare the list of unique names with each column of each dataset. The way I would do it for each dataset separately is as follows:

u<-as.vector(unique(unlist(df1)))
n<- ifelse(u%in%df1$x1 & u%in%df1$x2 & u%in%df1$x3 & 
               u%in%df1$x4", 1, 0)
Names_1<-cbind(u, n) #values with a 1 are the names included in all columns of dataset

Is there any nice way to do the above calculation for all datasets at once?

Thanks a lot in advance!

ZayzayR
  • 183
  • 9
  • 1
    Try unique_list <- lapply(df_list, function(x) {Reduce(intersect, x)}) – Wave Aug 03 '20 at 19:38
  • Another way to solve the problem can be found on here: [https://stackoverflow.com/questions/63247445/how-to-check-in-how-many-columns-character-can-be-found/63247993?noredirect=1#comment111842575_63247993](https://stackoverflow.com/questions/63247445/how-to-check-in-how-many-columns-character-can-be-found/63247993?noredirect=1#comment111842575_63247993) – ZayzayR Aug 04 '20 at 14:29

1 Answers1

0

try it this way

library(tidyverse)
library(janitor)
df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Alex"), 
                 x3=c("Tomas","Alex","Ben", "Paul", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Tyler"), 
                 x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"), 
                 x2=c("Ben","Paul","Tim", "Linda", "Tyler"), 
                 x3=c("Tyler","Alex","Ben", "Tyler", "MJ"), 
                 x4=c("Ben","Alex","Tim", "Lisa", "MJ"))

df <- bind_cols(df1, df2, df3) %>% clean_names()

uniq_name <- df %>% 
  pivot_longer(everything(), names_to = NULL) %>% 
  distinct() %>% 
  pull()

map(uniq_name, ~ colSums(df == .x) >= 1) %>% 
  map_lgl(all) %>% 
  as_tibble() %>% 
  add_column(uniq_name) %>% 
  filter(value)

# A tibble: 1 x 2
  value uniq_name
  <lgl> <chr>    
1 TRUE  Ben 
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14