I posted last week about how to replace values in one dataframe with value from another dataframe if conditions are met to create my desired result on a single set of data (which I have since solved). Now, I am trying to create a loop that can be executed for all the files I have in a folder.
Briefly, each set of data has a matching pair .tsv files: One of the raw data, and one of the means of the replicates (specified in the instrument software prior to exporting .tsv files). An example pairing would look like "072621Liver1.tsv" (the raw data file) and "072621Liver1_replicates.tsv" (means data). The previous example I posted describes how I created a single tibble from the two files.
Now, I am struggling to batch process all the paired files in my data set. I posted my best solution to this so far, but I'm still getting error messages like the one below.
(Error: Cannot open file for writing: 'C:\Users\asmit\Desktop\pratice_files\072621Liver1.tsv)'
If I don't get that message and the code runs, the .csv files I am also trying to write are not being created as they don't show up in the folder after it executes. I know something is off but I can't put my finger on what exactly it is. I've posted my script for best solution I've come up with thus far below... any help to get this to actually run would be really appreciated! I feel like I'm close to the answer but can't quite get there.
###import .tsv of PCR results, formatting, natural sort. Export cleaned file as .csv.
###import .tsv of Replicates file, split $Samples column in two and re-combine in Replicates.
###Export modified Replicates tibble to new .csv file.
##environment setup; change folder accordingly. Install tidyverse if needed.
#setwd("C:/Users/asmit/Desktop/pratice_files/pratice/")
##install.packages(tidyverse)
library(tidyverse)
### lists of all sets of files (lists are the same length)
singlet_files <- list.files(path = ".", pattern = "[^replicates]\\.tsv")
singlet_cleaned <- list.files(path = ".", pattern = "[_cleaned]\\.csv")
matching_pair_files <- list.files(path = ".", pattern = "[replicates]\\.tsv")
tibble_singlet <- function(x) { ###function to create tibble from singlet files
cleanup_tibble <- as_tibble(read_tsv(x, col_names = TRUE, skip = 1))
}
singlet_cleanup <- function(x) { ##function to clean singlet files
new_file <- str_replace(x, "[.*].tsv", "_cleaned.csv")
tibble_singlet(x) %>%
select("Pos", "Name", "Cp", "Concentration") %>%
.[str_order(.$Pos, numeric = TRUE),] %>%
write_csv(file = new_file)
}
lapply(singlet_files, singlet_cleanup) ## <- run (singlet_cleanup) on files in singlet_files.
##I get the error code here. If I skip over this part
##and only run the second half (below) it works,
##but I don't get any output from it.
cleaned_tibble <- function(y) { ##function to read cleaned .csv files as tibble
Pos_tibble <- as_tibble(read_csv(y, col_names = TRUE))
}
match <- function(m){ ##function to make tibble of replicate file
match_tibble <- as_tibble(read_tsv(m, col_names = TRUE, skip = 1))
}
merged <- function(m,y){ ##function to merge match tibble with specific column of cleaned_tibble tibble
organ <- regmatches(m, regexpr("(Liver|Lung|Kidney|Spleen)", m))
output_file <- gsub(".*replicates.tsv", ".*final.csv", m)
match(m) %>%
mutate("R1" = gsub(x = .$Samples, pattern = "^(.*),.*", replacement = "\\1")) %>%
mutate("R2" = gsub(x = .$Samples, pattern = ".*,\\s(.*)", replacement = "\\1")) %>%
pivot_longer(cols = c("R1", "R2"), names_to ="Well Pairs", values_to = "Wells") %>%
select("MeanCp", "STD Cp", "Mean conc", "STD conc", "Wells") %>%
relocate("Wells", 1) %>%
right_join((cleaned_tibble(y)), by = c("Wells"="Pos")) %>%
.[str_order(.$Wells, numeric = TRUE),] %>%
select("Name", "MeanCp", "STD Cp", "Mean conc", "STD conc") %>%
distinct(Name, .keep_all = TRUE) %>%
add_column(Organ = organ) %>%
write_csv(file = output_file)
}
map2(m=matching_pair_files, y=singlet_cleaned, ~merged(m,y)) ##I feel like this isn't correct,
##but don't know how to fix it to
##actually process correctly
EDIT breaking the code up into two parts. Corrected attempted @regexp, error messaging.
First part (which now works thanks to @scrameri)
###import .tsv of PCR results, formatting, natural sort. Export cleaned file as .csv.
###import .tsv of Replicates file, split $Samples column in two and re-combine in Replicates.
###Export modified Replicates tibble to new .csv file.
##environment setup; change folder accordingly. Install tidyverse if needed.
#setwd("C:/Users/asmit/Desktop/pratice_files/pratice/")
##install.packages(tidyverse)
library(tidyverse)
### lists of all sets of files (lists are the same length)
singlet_files <- list.files(path = ".", pattern = "[^replicates]\\.tsv")
singlet_cleaned <- list.files(path = ".", pattern = "[_cleaned]\\.csv")
matching_pair_files <- list.files(path = ".", pattern = "[replicates]\\.tsv")
tibble_singlet <- function(x) { ###function to create tibble from singlet files
cleanup_tibble <- as_tibble(read_tsv(x, col_names = TRUE, skip = 1))
}
singlet_cleanup <- function(x) { ##function to clean singlet files
new_file <- str_replace(x, "(.*).tsv", "\\1_cleaned.csv")
tibble_singlet(x) %>%
select("Pos", "Name", "Cp", "Concentration") %>%
.[str_order(.$Pos, numeric = TRUE),] %>%
write_csv(file = new_file)
}
lapply(singlet_files, singlet_cleanup) ##run (singlet_cleanup) on files in singlet_files
#> list()
Second part
cleaned_tibble <- function(y) { ##function to read cleaned .csv files as tibble
Pos_tibble <- as_tibble(read_csv(y, col_names = TRUE))
}
match <- function(m){ ##function to make tibble of replicate file
match_tibble <- as_tibble(read_tsv(m, col_names = TRUE, skip = 1))
}
merged <- function(m,y){ ##function to merge match tibble with specific column of cleaned_tibble tibble
organ <- regmatches(m, regexpr("(Liver|Lung|Kidney|Spleen)", m))
output_file <- str_replace(m, "(.*)_replicates.tsv", "\\1_final.csv")
match(m) %>%
mutate("R1" = gsub(x = .$Samples, pattern = "^(.*),.*", replacement = "\\1")) %>%
mutate("R2" = gsub(x = .$Samples, pattern = ".*,\\s(.*)", replacement = "\\1")) %>%
pivot_longer(cols = c("R1", "R2"), names_to ="Well Pairs", values_to = "Wells") %>%
select("MeanCp", "STD Cp", "Mean conc", "STD conc", "Wells") %>%
relocate("Wells", 1) %>%
right_join((cleaned_tibble(y)), by = c("Wells"="Pos")) %>%
.[str_order(.$Wells, numeric = TRUE),] %>%
select("Name", "MeanCp", "STD Cp", "Mean conc", "STD conc") %>%
distinct(Name, .keep_all = TRUE) %>%
add_column(Organ = organ) %>%
write_csv(file = output_file)
}
map2(m=matching_pair_files, y=singlet_cleaned, merged(m,y))
#> Error in map2(m = matching_pair_files, y = singlet_cleaned, merged(m, : could not find function "map2"
Created on 2021-09-22 by the reprex package (v2.0.1)
Created on 2021-09-22 by the reprex package (v2.0.1)