0

First let me say that I am not an expert coder and any advice about this particular question or my general technique will be greatly appreciated.

I have a large data set that is made up of similar data frames named Table6.# such as: Table6.1, Table6.2, ect. I have variables in each data frame that repeat as well, such as: ST1_Delta_PV%, ST2_Delta_PV%, ect. and ST1_Realloc_Margin, ST2_Reallocation_Margin, ect.

I am trying to write several nested loops that will calculated values in each table across these similar variables. I have tried to do this with the paste function as shown below, but this is obviously not the correct way to do this.

for (i in 1:25){
 for (j in 1:4){
  for (k in 1:length(paste("Table6.",i,"sep="")[,1]){
  paste("Table6.",i,sep="")$paste("ST",j,"NonTgt_Shr",sep="")[k] <- paste("Table6.",i,sep="")$paste("ST",j,"_Delta_PV%",sep="")[k] * paste("Table6.",i,sep="")$paste("ST",j,"_Reallocation_Margin",sep="")[k]
  }
 }
}

I apologize if this is a complete mess. I appreciate your help.

Rees B.
  • 1
  • 2
  • 2
    Place the datasets in a `list`, loop through the list and do the calculation. From your code, it is not that clear what you are dyring to do. Anyway, the `paste` approach is not the way to go (especially the assignment) – akrun Aug 22 '16 at 15:56
  • [See here for making/working with lists of data frames](http://stackoverflow.com/a/24376207/903061). – Gregor Thomas Aug 22 '16 at 16:02

2 Answers2

1

As akrun says, you should put your data frames in a list

Tables <- list(Table6.1, Table6.2, …)

for (Table in Tables) { … }

This way, you do not need to use paste to construct the different Table names.

For accessing the different columns, you can use the df["column"] syntax - this is similar to df$column, except that inside the brackets, you can use any string

nonTgt_Shr.column.name <- paste0("ST",j,"NonTgt_Shr")
delta.column.name <- paste0("ST",j,"_Delta_PV%")
for (k in 1:nrow(Table) {
    Table[nonTgt_Shr.column.name][k] <- Table[delta.column.name][k] * …
}

Note how I use variables for storing the name, making the line with the actual computation much more readable. Also, nrow is more intuitive than length(Table[,1]).

Community
  • 1
  • 1
Martin Nyolt
  • 4,463
  • 3
  • 28
  • 36
1

The calculations could be transformed into a function which improves readability, scaling and robustness

In the actual calculation function, the function get is used to retrieve the data frame based on the name.

#Calculation Function

fn_CalcVariables <- function(
    tableName="Table6.1",
    outputVarName="NonTgt_Shr",
    inputVarNames=c("_Delta_PV%", "_Reallocation_Margin"),
    variablePrefix="ST1"
) {
    DF <- get(tableName)

    outputVarName <- paste0(variablePrefix, outputVarName)
    inputVarNames <- paste0(variablePrefix, inputVarNames)

    DF[,outputVarName] <- DF[,inputVarNames[1]] * DF[,inputVarNames[2]]

    return(DF)

}

This function should by called by nested lapply calls. lapply iterates over the lists of the arguments, calls the function (second argument), and collects a list of the return values. (As an exercise, try l <- list(a=1, b=2); lapply(l, function(x) { x*2 }).)

#List object names for tables and variable names

tableNamesList <- paste0("Table6.",1:25)
variablePrefixList <- paste0("ST",1:4)

#Nested loops to invoke custom function from above
lapply(variablePrefixList, function(alpha) {

    lapply(tableNamesList, function(x, varprefix=alpha) {

        cat("Begin Processing Table",x,"varPrefix",varprefix,"\n")

        fn_CalcVariables(
            tableName=x,
            outputVarName="NonTgt_Shr",
            inputVarNames=c("_Delta_PV%","_Reallocation_Margin"),
            variablePrefix=varprefix
        ) 
        cat("End Processing Table", x, "varPrefix", varprefix, "\n")

    }) #End of innner lapply

}) #End of outer lapply
Martin Nyolt
  • 4,463
  • 3
  • 28
  • 36
Silence Dogood
  • 3,587
  • 1
  • 13
  • 17
  • I can definately see the value of using creating a function. This uses some operations that I have not used before. I will have to do some reading about these. Thank you very much. – Rees B. Aug 23 '16 at 02:11
  • I added indentation and some explanations, hope that is okay for you. – Martin Nyolt Aug 23 '16 at 07:35
  • Thanks for the edits, helps users to transition from loops to *ply family – Silence Dogood Aug 23 '16 at 09:46
  • Oh, and the result of the `fn_CalcVariables` should be saved and returned after the `cat`. (I'm not sure if I should edit your post again, or let you fix it. At least, this should be stated here as a comment so @ReesB. notices this error in the code.) – Martin Nyolt Aug 23 '16 at 10:20