I have several data frame with format like below. I want to join/merge the data frames by species
and extracting kmers
from all data frames such that the out contains one column with species
and multiple column with kmers
, one form each of the files. The kmers
column will then be give the name of the file from which it originated.
df1
reads taxReads kmers species
232 2323 23234 Bacteria
555 12 4545 Virus
df2
reads taxReads kmers species
12 23 56 Bacteria
932 1213 12 Virus
out
species df1 df2
Bacteria 23234 56
Virus 4545 12
I have tried making a script using join_all, but it does not select the correct column (kmers
):
file_list = list.files(pattern="tsv$")
datalist = lapply(file_list, function(x){
dat = read.csv(file=x, header=T, sep = "\t")
names(dat)[2] = x
return(dat)
})
joined <- join_all(dfs = datalist,by = "species",type ="full" )