0

I was wondering how I can make the loop that is most efficient when more than one loop is expected.

  • The data I have is in a form of: GXXX_Y.csv where XXX represents the group identifier (about 80 teams) and Y represents the group member (1, 2, 3). To be noted, there are some teams that do not have all 3 members.
  • Moreover, there are 10 variables to be selected in each file, to be run for each (Step 3 below). They are in forms of: VXX_r

The following is the code that I plan to run for all groups/group members (example of Group 001 below), then compiling the results into one table for all teams (that is, Step 6). I think the loop needs to work on the identifiers for Step 1, Step 3, and Step 6 below.

I would really appreciate it if you could advise and help me on building efficient loops for this context!

# Step 1. 3 members' dataframes in Group001
i1 <- fread("data/G001_1.csv")
i2 <- fread("data/G001_2.csv")
i3 <- fread("data/G001_3.csv")

# Step 2. Trim the lengths for each member
trimToLength1 <- floor( nrow(i1) / 60) * 60 
trimToLength2 <- floor( nrow(i2) / 60) * 60 
trimToLength3 <- floor( nrow(i3) / 60) * 60 

# Step 3. Select a specific variable one by one (in this case, V01_r) for each member
x1 <- i1$V01_r[1:(trimToLength1)]
x2 <- i2$V01_r[1:(trimToLength2)]
x3 <- i3$V01_r[1:(trimToLength3)]

# Step 4. Run the function of correlations matrix among the members in Group 001
wcc1 <- calcWCC(x1, x2, maxLag=60, winSize=30, windowInc=30, lagInc=30)
wcc2 <- calcWCC(x1, x3, maxLag=60, winSize=30, windowInc=30, lagInc=30)
wcc3 <- calcWCC(x2, x3, maxLag=60, winSize=30, windowInc=30, lagInc=30)

# Step 5. Extract two means from each of the matrix above
sync1a <- aggWCC(wcc1, method="mean")
sync1b <- aggWCC(wcc1, method="peak")
sync2a <- aggWCC(wcc2, method="mean")
sync2b <- aggWCC(wcc2, method="peak")
sync3a <- aggWCC(wcc3, method="mean")
sync3b <- aggWCC(wcc3, method="peak")

# Step 6. Make a table for all teams
result_table <- data.frame(TeamID="G001", From=c("x1","x1","x2"), To=c("x2","x3","x3"), Mean=c(sync1a,sync2a,sync3a), Peak=c(sync1b,sync2b,sync3b))
user14250906
  • 197
  • 8
  • It looks like you should make a function that processes one file, and then apply that in a loop, or my preference, with purrr::map like this: https://www.gerkelab.com/blog/2018/09/import-directory-csv-purrr-readr/ – Jon Spring Mar 12 '21 at 21:30
  • *"Would you recommend making a list of the data files and the variable names first?"* yes, and then use a `list` of data frames. See [my answer here](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) for discussion and examples. Note that nothing here is about efficiency in terms of computation time - that will be basically the same. But it will make the code you write shorter, clearer, and more scalable. – Gregor Thomas Mar 12 '21 at 21:35
  • @Gregor Thomas Thank you for your suggestion. There are multiple teams and multiple team members identified only by the file names (that is, there are no identifiers within the file). Would it be possible to split the list according to the file name? – user14250906 Mar 12 '21 at 21:51
  • Sure- the filename is just a string. You can break it apart into the parts that identify team and team member and do whatever you want with it. See `?strsplit`, or `tidyr::separate`. – Gregor Thomas Mar 15 '21 at 13:34

0 Answers0