0

In a previous query, I wanted to find out if I could find a solution to repetitive process similar to SAS macro variables. The link is as follows:

R macros to enable user defined input similar to %let in SAS

However I was looking to take this a step forward by exploring the potential of specifying a character list instead of user inputting the values for the macro variable (calling it macro variable for convenience) every time.

For example, here is a short extract of the code i am working on which specifies the use of the macro variables using the paste0 function:

### Change metric between MSP, RSP, Val, Price MSP, Price RSP, Margin
### Change level between product and channel
### Change histyr, baseyr, futeyr according to year value
metric <- "RSP"
level <- "channel"
histyr <- "2009"
baseyr <- "2014"
futeyr <- "2019"
macro <- "gni"
macro1 <- "pce"

inputpath <- "C:/Projects/Consumption curves/UKPOV/excel files/"
outpath <- "C:/Projects/Consumption curves/UKPOV/output/gen_output/"


infile <- paste0(inputpath, metric, "_", level, "_CC_", histyr, "_salespercap.csv")
Category_sales <- read.csv(infile)
macroeco_data <- read.csv(infile2)

macroeco_data$Country <- str_trim(macroeco_data$Country)

sales_nd_macroeco <- sqldf("SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
                           FROM Category_sales L LEFT JOIN macroeco_data R
                           ON (L.Country = R.Country) order by GNI_PPP DESC")

Now instead of specifying each metric every time I was looking to create a character list or a character vector and use loops to run for every metric without manual intervention.

I tried the following but it doesnt seem to work. Not sure if I am doing this right

metric <- c("MSP", "RSP", "Vol", "PriceMSP", "PriceRSP", "Margin")

for (i in metric) {
  level <- "channel"
  histyr <- "2009"
  baseyr <- "2014"
  futeyr <- "2019"
  macro <- "gni"
  macro1 <- "pce"
  inputpath <- "C:/Projects/Consumption curves/UKPOV/excel files/"
  outpath <- "C:/Projects/Consumption curves/UKPOV/output/gen_output/"

  infile <- paste0(inputpath, metric[i], "_", level, "_CC_", histyr, "_salespercap.csv")
  Category_sales <- read.csv(infile)


  infile2 <- paste0(inputpath,"macro_",histyr,".csv")
  macroeco_data<- read.csv(infile2)

  macroeco_data$Country<-str_trim(macroeco_data$Country)

  sales_nd_macroeco <- sqldf("SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
                           FROM Category_sales L LEFT JOIN macroeco_data R
                           ON (L.Country = R.Country) order by GNI_PPP DESC")
}

The error is as below:

 metric<-c("MSP","RSP", "Vol","PriceMSP" ,"PriceRSP", "Margin")
> metric
[1] "MSP"      "RSP"      "Vol"      "PriceMSP" "PriceRSP" "Margin"  
> for(i in metric){
+ level<-"channel"
+ histyr<-"2009"
+ baseyr<-"2014"
+ futeyr<-"2019"
+ macro<-"gni"
+ macro1<-"pce"
+ inputpath<-"C:/Projects/Consumption curves/UKPOV/excel files/"
+ outpath<-"C:/Projects/Consumption curves/UKPOV/output/gen_output/"
+ infile <- paste0(inputpath,metric[i],"_",level,"_CC_",histyr,"_salespercap.csv")
+ Category_sales <- read.csv(infile)
+ infile2 <- paste0(inputpath,"macro_",histyr,".csv")
+ macroeco_data<- read.csv(infile2)
+ macroeco_data$Country<-str_trim(macroeco_data$Country)
+ sales_nd_macroeco <- sqldf("SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
+ FROM Category_sales L LEFT JOIN macroeco_data R
+ ON (L.Country = R.Country) order by GNI_PPP DESC")
+ }
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'C:/Projects/Consumption curves/UKPOV/excel files/NA_channel_CC_2009_salespercap.csv': No such file or directory
> 
Community
  • 1
  • 1
user36176
  • 339
  • 1
  • 2
  • 11

1 Answers1

2

The following for loop:

for (x in y) {
  # do things
}

iterates over the elements of y, assigning each one in turn to the object x and executing the expressions contained within the loop.

In your example, for (i in metric), where metric is a character vector, the object i assumes the value of each element of metric in turn. That is, the first time through the loop, i is "MSP"; the second time through, i is "RSP", and so on. So later, where you refer to metric[i], the first time through the loop this is equivalent to metric["MSP"], which is of course NA (in the case of your unnamed vector). This in turn leads to the file name "C:/Projects/Consumption curves/UKPOV/excel files/NA_channel_CC_2009_salespercap.csv".

The fact that you refer to metric[i] suggests that you expected the values of i to be the indices of the elements of metric, i.e. the numbers 1 through 6. To achieve that behaviour, you'd typically use the following loop:

for (i in 1:length(metric)) {
  # do things
}

or, equivalently

for (i in seq_along(metric)) {
  # do things
}

Something like the following will probably do the trick:

metric <- c('MSP', 'RSP', 'Vol', 'PriceMSP', 'PriceRSP', 'Margin')
inputpath <- 'C:/Projects/Consumption curves/UKPOV/excel files/'
level <- 'channel'
histyr <- '2009'

macroeco_data <- read.csv(paste0(inputpath, 'macro_', histyr, '.csv'))
macroeco_data$Country <- str_trim(macroeco_data$Country)

for (x in metric) {
  f <- paste0(inputpath, x, '_', level, '_CC_', histyr, '_salespercap.csv')
  Category_sales <- read.csv(f)
  sales_nd_macroeco <- sqldf('SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
                             FROM Category_sales L LEFT JOIN macroeco_data R
                             ON (L.Country = R.Country) order by GNI_PPP DESC')
})

Note that I've pulled everything out of the loop that doesn't need to be in there.

Also, be aware that the value of sales_nd_macroeco is overwritten each time through the loop, so the final value of that object will correspond to the metric "Margin". To instead return a list of objects, you could either iterate over the indices 1:6 with for (i in seq_along(metric)), assigning the result of sqldf to sales_nd_macroeco[[i]], where sales_nd_macroeco is now an empty list of length 6 that you define to begin with, or you can use lapply:

sales_nd_macroeco <- lapply(metric, function(x) {
  f <- paste0(inputpath, x, '_', level, '_CC_', histyr, '_salespercap.csv')
  Category_sales <- read.csv(f)
  sqldf('SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
         FROM Category_sales L LEFT JOIN macroeco_data R
         ON (L.Country = R.Country) order by GNI_PPP DESC')
}) 
jbaums
  • 27,115
  • 5
  • 79
  • 119
  • Thanks for the explanation. Works fine when I incorporate only one element in the vcetor like metric<-c("MSP"). THe moment I add two or more like metric<-c("MSP", "RSP") it shows the following error: **Error in file(file, ifelse(append, "a", "w")) : invalid 'description' argument In addition: Warning message: In if (file == "") file <- stdout() else if (is.character(file)) { : the condition has length > 1 and only the first element will be used** – user36176 Jul 18 '16 at 11:21
  • @user36176: try debugging with `file.exists(paste0(inputpath, metric, '_', level, '_CC_', histyr, '_salespercap.csv'))`. Are all values `TRUE`? – jbaums Jul 18 '16 at 12:02
  • Figured it out , I was missing an [i] in one of the places. Works perfect. Incorporated a second loop to change different values of "level" as well. :) – user36176 Jul 18 '16 at 18:51