I am a beginner in R and trying to solve the following problem. I have 30 datasets for which I need to apply the same calculations. The datasets contain names and I have to find the names that are included in all columns within each dataset. All datasets have 4 columns. For simplicity reasons, lets assume that I have the following 3 datasets:
df1<- data.frame(x1=c("Ben","Alex","Tim", "Lisa", "MJ"),
x2=c("Ben","Paul","Tim", "Linda", "Alex", "MJ"),
x3=c("Tomas","Alex","Ben", "Paul", "MJ", "Tim"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Ben"))
df2<- data.frame(x1=c("Alex","Tyler","Ben", "Lisa", "MJ"),
x2=c("Ben","Paul","Tim", "Linda", "Tyler", "MJ"),
x3=c("Tyler","Alex","Ben", "Tyler", "MJ"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))
df3<- data.frame(x1=c("Lisa","Tyler","Ben", "Lisa", "MJ"),
x2=c("Lisa","Paul","Tim", "Linda", "Tyler", "MJ"),
x3=c("Tyler","Alex","Ben", "Tyler", "MJ", "Lisa"),
x4=c("Ben","Alex","Tim", "Lisa", "MJ", "Tyler"))
My idea was that I first extract every unique name in each dataset (as they differ and sometimes occur several times in a dataset) and then look whether these unique names are included in every column of each dataset. Therefore, I already combined all datasets in a list of datasets using:
df_list<-list(df1,df2,df3)
Then I extracted the unique names in each dataset using:
unique_list <- lapply(df_list, function(x) {
as.vector(unique(unlist(x)))
})
Here is where I get stuck. I do not know how to compare the list of unique names with each column of each dataset. The way I would do it for each dataset separately is as follows:
u<-as.vector(unique(unlist(df1)))
n<- ifelse(u%in%df1$x1 & u%in%df1$x2 & u%in%df1$x3 &
u%in%df1$x4", 1, 0)
Names_1<-cbind(u, n) #values with a 1 are the names included in all columns of dataset
Is there any nice way to do the above calculation for all datasets at once?
Thanks a lot in advance!