0

I have 48 scripts used to clean data corresponding to 48 different tests. The cleaning protocols for each test used to be unique and test-specific, but after some time the final project guideline allows that all tests may use the same cleaning protocol granted they save all output files to the appropriate directory (each test's own folder of results). I'm trying to combine these tests into one master cleaning script that can be used by any team member to clean data as more is collected, or make small changes, given they have the raw data files and a folder for each test (that I would give to them).

Currently I have tried two approaches: The first is to include all necessary libraries in the body of a master cleaning script, then source() each individual cleaning script. Inside each script, the libraries are the require()ed, the appropriate files are read in, and code for the files are saved to their correct destination. This method seems to work best, but if the whole script is run, some subtests are successfully cleaned and saved to their correct locations, and the rest need to be saved individually--I'm not sure why.

library(readr)
library(dplyr)
library(data.table)
library(lubridate)

source("~/SF_Cleaning_Protocol.R")
etc
.
.

The second is the save the body of the general cleaning script as a function, and then call that function in a series of if statements based on the test one wants to clean. For example:

if (testname == "SF"){
  setwd("~/SF")
  #read in the csv file
  subtest<- read_csv()
  path_map<- read_csv()
  SpecIDs<- read_csv()

  CleaningProtocol(subtest,path_map,SpecIDs)

  write.csv("output1.csv")
  write.csv("output2.csv")
  write.csv("output3.csv")
  write.csv("output4.csv")

} else if (testname == "EV"){
etc
}

The code reads in and prints out files fine if selected individually, but when testname is specified and the script is run as a whole, it ignores the if statements, runs all test, but fails to print results for any.

Is there a better option I haven't tried, or can anyone help me diagnose my issues? Many thanks.

  • *it ignores the if statements, runs all test*... I can only see this happen if you run both solutions: `source` each file (which effectively runs it) AND multiple `if` blocks. Did you separate the two? Show assignment of `CleaningProtocol()`. – Parfait Mar 23 '20 at 23:59
  • The two are separated into separate scripts, so they aren't being run together. `CleaningProtocol()` is quite long, perhaps too long to show. Without context, the results are global variables: `test<<-cbind(cutData,diffGendm,diffGendf)` `x.sub8$Pilot_path_ID<<-matrix(unlist(x.sub8$Pilot_path_ID))` `incompletesubtest <<- subset(x.sub8, Subtest_complete_.75 ==0)` ` data <<- as.data.frame(matrix(0, ncol = 14, nrow = 11))` which are then passed to `writecsv()` outside of the function. – Connor Cheek Mar 25 '20 at 19:57
  • Please do not post code in comments. Instead, [edit](https://stackoverflow.com/posts/60822116/edit) your post with such code. Then, delete this comment. Use of scoping assignment `<<-` may be the issue as it adjusts global environment objects and should rarely be used (if at all) within a function. – Parfait Mar 25 '20 at 20:19

0 Answers0