1

I have a small problem but I cannot manage to figure it out. I have 2 list of multiple tibbles generated by using dplyr read.xlsx function. Now i want to merge the first element of each list with each other using the left_join function and by the name of a shared column, that means that the tibbles in both lists share a column named Study File ID. So i want to merge each element of each list with each other using left_join by=c(Study File ID= Study FIle ID) for instance. I know how to merge a single list with Tibble dataframes but not using each element of 2 lists. Hopefully someone can help me

files_all_compounds <- list.files("~/Internship/Internship/Script/Results - Excel files/Script_map", pattern = '*CompoundsPerFile_ML*') 
input_files <-  list.files("~/Internship/Internship/Script/Results - Excel files/Script_map", pattern = '*inputfiles_ML*') 

setwd("~/Internship/Internship/Script/Results - Excel files/Script_map")

data_all_compounds <- lapply(files_all_compounds,read_xlsx)
data_input_files <- lapply(input_files,read_xlsx)

...
MrFlick
  • 195,160
  • 17
  • 277
  • 295
Miklo
  • 13
  • 4
  • Does eatch data frame have the same columns? If so, I think you probably want to use `data.table::rbindlist` to collapse your list of data frames into a single one. – Mako212 Jan 03 '18 at 21:15
  • So to make it clearer, I used the first element of data_all_compounds and the first element of data_input_files and merge them by matching `Study File ID` and then the next in both lists etc. At the end i want to merge all the tibbles together to one big data_frame.I know its going to be a combination of the function, merge, Reduce, Map, left_join – Miklo Jan 03 '18 at 21:16
  • 1
    It's difficult to understand what you are trying to achieve without seeing a sample of the data concerned. You mention the use of a left-join, but I have my doubts about that if each Excel sheet is some kind of student result list. In the absence of any clarification in your question I reckon you'd need a master list of all students whose results are being read to act as the left-side join. Left-joining results against results is not going to be what you expect it to be. – Stewart Ross Jan 03 '18 at 21:24
  • @Mako212 No thats not possible, it is crucial that the the left_join happens first before all the dataframes are merged because the entries in the "Study File ID"could be reoccuring, like the entry "F12" refers to different files. So joining first and than merging all – Miklo Jan 03 '18 at 21:25
  • You could just customize your data load process to add a column indicating which file each row came from. – Mako212 Jan 03 '18 at 21:28
  • Or maybe you want `mapply(dplyr::left_join, data_all_compounds, data_input_files, by=c(...))` – Mako212 Jan 03 '18 at 21:32
  • Well i thought it would be an easy question, apparently i wasnt, I have multiple excel files (inputfiles and compound files) which i want to merge with corresponding "Study File ID" column values, thats it – Miklo Jan 03 '18 at 21:33
  • You know your data and I don't, but I would be very cautious about assuming a specific method and sequence (a left join first - of what against what?) is essential. You say it is crucial but without seeing a sample of the data there is no way I can see of determining what is and is not possible. – Stewart Ross Jan 03 '18 at 21:35
  • Yeah your absolutely right, but i'm rather new so I'm not familiar with the right procedure. I will have a look if i can copy the excel code or somehting or make screenshots,its just so frustrating, I know its something very easy but i just cannot grasp it – Miklo Jan 03 '18 at 21:38
  • You might find this post on [Creating a Great Reproducible Example in R](https://stackoverflow.com/a/5963610/4421870) helpful – Mako212 Jan 03 '18 at 21:44
  • I just found out that if i use merge it goes fine. So i want to apply the merge function to all the elements of the 2 list. So merge(list[1], otherlist[1]) and than merge(list[2],otherlist[2]) etc. So all the merged tibbles should be combined using bind_rows() so all the tibbles are glued together to one big tibble. Hope it became a bit clearer, i tried: mapply(function(X,Y) merge(X,Y),data_all_compounds,data_input_files) but that goes terribly wrong – Miklo Jan 04 '18 at 11:18
  • Okay i solved it using a for loop and merged the lists and add the ouput to a new list and than use the bind_rows function. – Miklo Jan 04 '18 at 12:32

0 Answers0