R, creating variables on the fly in a list using assign statement

Question

I want to create variable names on the fly inside a list and assign them values in R, but I am unable to get the desired result. Here is the logic of my code:

Upon the function call: dat_in <- readf(1,2), an input file is read based on a product and site. After reading, a particular column (13th, here) is assigned to a variable aot500. I want to have this variable return from the function for each combination of product and site. For example, I need variables name in the list statement as aot500.AF, aot500.CM, aot500.RB to be returned from this function. I am having trouble in the return statement. There is no error but there is nothing in dat_in. I expect it to have dat_in$aot500.AF etc. Please inform what is wrong in the return statement. Furthermore, I want to read files for all combinations in a single call to the function, say using a for loop and I wonder how would the return statement handle list of more variables.

prod <- c('inv','tot')
site <- c('AF','CM','RB')
readf <- function(pp, kk) {
            fname.dsa <- paste("../data/site_data_",prod[pp],"/daily_",site[kk],".dat",sep="")
            inp.aod <- read.csv(fname.dsa,skip=4,sep=",",stringsAsFactors=F,na.strings="N/A")
            aot500 <- inp.aod[,13]
            return(list(assign(paste("aot500",siteabbr[kk],sep="."),aot500)))
         }

I don't quite get what you're doing but why do you need `assign` here? Why not just return the list? — Dason, Jun 22 '16 at 05:36
Try putting the argument ` env=.GlobalEnv` in the assign bit of the function. This should return the object with name "aot500.XXXX" into the user environment. — Adam Quek, Jun 22 '16 at 05:52
Instead of assigning variables names on the fly within the list, you might prefer to have one function that returns the full data frame `inp.aod`. Then another function that filters what data you want from there. Give a sample of the csv file and a sample of the desired output so that we can help you with the design of this function. — Paul Rougieux, Jun 22 '16 at 07:01
@AdamQuek i have tried putting the environment but it still does not give what I require. I share link to my sample input data in my next comment. Please help! — jkp, Jun 22 '16 at 13:05
@PaulRougieux: Please find a link of my sample input data (https://www.dropbox.com/s/fx0klyi1byfo1yv/daily_CM.dat?dl=0) — jkp, Jun 22 '16 at 13:05
@jkp Here is how to make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451#5965451). — Paul Rougieux, Jun 22 '16 at 14:51

score 3 · Answer 1 · answered Jun 22 '16 at 07:23

Almost always there is no need to use assign(), we can solve the problem in two steps, read the files into a list, then give names.

(Not tested as we don't have your files)

prod <- c('inv', 'tot')
site <- c('AF', 'CM', 'RB')

# get combo of site and prod
prod_site <- expand.grid(prod, site)
colnames(prod_site) <- c("prod", "site")

# Step 1: read the files into a list
res <- lapply(1:nrow(prod_site), function(i){
  fname.dsa <- paste0("../data/site_data_",
                      prod_site[i, "prod"],
                      "/daily_",
                      prod_site[i, "site"],
                      ".dat")
  inp.aod <- read.csv(fname.dsa, 
                      skip = 4,
                      stringsAsFactors = FALSE,
                      na.strings = "N/A")
  inp.aod[, 13]
})

# Step 2: assign names to a list
names(res) <- paste("aot500", prod_site$prod, prod_site$site, sep = ".")

Please find a link of my sample input data (https://www.dropbox.com/s/fx0klyi1byfo1yv/daily_CM.dat?dl=0) — jkp, Jun 22 '16 at 13:06

Paul Rougieux · Answer 2 · 2016-06-23T17:49:15.457

I propose two answers, one based on dplyr and one based on base R. You'll probably have to adapt the filename in the readAOT_500 function to your particular case.

Base R answer

#' Function that reads AOT_500 from the given product and site file
#' @param prodsite character vector containing 2 elements
#' name of a product and name of a site
readAOT_500 <- function(prodsite, 
                        selectedcolumn = c("AOT_500"),
                        path = tempdir()){
    cat(path, prodsite)
    filename <- paste0(path, prodsite[1],
                       prodsite[2], ".csv")
    dtf <- read.csv(filename, stringsAsFactors = FALSE)
    dtf <- dtf[selectedcolumn]
    dtf$prod <- prodsite[1]
    dtf$site <- prodsite[2]
    return(dtf)
}
# Load one file for example 
readAOT_500(c("inv", "AF"))


listofsites <- list(c("inv","AF"),
                    c("tot","AF"),
                    c("inv", "CM"),
                    c( "tot", "CM"),
                    c("inv", "RB"),
                    c("tot", "RB"))
# Load all files in a list of data frames
prodsitedata <- lapply(listofsites, readAOT_500)
# Combine all data frames together
prodsitedata <- Reduce(rbind,prodsitedata)

dplyr answer

I use Hadley Wickham's packages to clean data.

library(dplyr)
library(tidyr)

daily_CM <- read.csv("~/downloads/daily_CM.dat",skip=4,sep=",",stringsAsFactors=F,na.strings="N/A")

# Generate all combinations of product and site.
prodsite <- expand.grid(prod = c('inv','tot'),
                        site = c('AF','CM','RB')) %>%
    # Group variables to use do() later on
    group_by(prod, site)

Create 6 fake files by sampling from the data you provided

You can skip this section when you have real data. I used various sample length so that the number of observations differs for each site.

prodsite$samplelength <- sample(1:495,nrow(prodsite)) 
prodsite %>% 
    do(stuff = write.csv(sample_n(daily_CM,.$samplelength),
                         paste0(tempdir(),.$prod,.$site,".csv")))

Read many files using dplyr::do()

prodsitedata <- prodsite %>%
    do(read.csv(paste0(tempdir(),.$prod,.$site,".csv"),
                stringsAsFactors = FALSE))
# Select only the columns you are interested in
prodsitedata2 <- prodsitedata %>%
    select(prod, site, AOT_500)

Thank you! However, none of the libraries are available for R-3.1.0 and I am not aware of '%>%' and 'group_by'. none works on my system. I am sorry but I wonder if it can be done using base package itself. — jkp, Jun 22 '16 at 17:42
Can you update R to the latest version? Then the `dplyr` package should install without problem. `%>%` is a chaining operator, also called pipe, you'll find nice explanations with examples in [dplyr vignette](https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html#chaining). Of course the same can be done with base R, by using `lapply` like in @zx8754's answer. — Paul Rougieux, Jun 23 '16 at 09:53

R, creating variables on the fly in a list using assign statement

2 Answers2

Base R answer

dplyr answer

Create 6 fake files by sampling from the data you provided

Read many files using dplyr::do()