-2

I created a function that allows to assign csv into variables. However it's not working and I have no idea why. Here's the code:

 file_id <- readline("Type a common word for your files: ")
 file_list <- list.files(pattern = file_id)

 multi.csv <- function(pattern.seq)
  {
    for(i in 1:length(pattern.seq))
       {
          assign(pattern.seq[i], read.csv(pattern.seq[i]))
       }
  }

multi.csv(pattern.seq = file_list)

cat(" Now your .csv files are stored in variables: ", file_list) 

Let me explain what's going on here. Lets assume that I have ten csv. files that differs only by a certain number in their names eg. file_01, file_02... Function list.files reckognizes the pattern in file names and stores it in the char vector c("file_02", "file_02"...). Then I wrote the function multi.csv that should assign the data stored in csv. into variables. However it's not working. What's more, when I call the assign function outside the multi.csv function with a certain element of file_list vector:

assign(file_list[1], read.csv(file_list[1]))

it works as it should: variable file_01 stores the data from file_01.csv.

Have you got any idea why it's not working inside multi.csv function?

I'm fully aware that the whole problem of reading many csv files can be solved differently but I want to know what's wrong here.

p0l00ck
  • 186
  • 3
  • 15
  • 1
    [Assign()](http://www.inside-r.org/r-doc/base/assign) expects a sting literal in first argument. Do not use it to extend a list. Plus you are re-using the same variable without a return in function. Consider: `dfList <- lapply(file_list, read.csv)` – Parfait Mar 24 '16 at 02:18

2 Answers2

1

The problem here is related to scope.

In R, every variable is stored in an environment.

  • There is a global environment which stores "top-level" variables. This is sometimes called the workspace or workspace environment.
  • There are package environments (actually two per package: one public and one private).
  • You can create custom environments with new.env().
  • You can copy data.frames, lists, and existing save() files to new environments with attach(), but this is discouraged.
  • There are a few fairly obscure and usually unimportant environments, such as the empty environment and the Autoload environment.
  • Finally, a new environment is created for each evaluation of a function, which is referred to as the evaluation environment for that particular evaluation of that particular function.

The documentation on assign() is a bit scattered when it comes to communicating exactly which environment receives the variable-to-be-assigned, but I'll try to quote all the relevant passages:

pos     where to do the assignment. By default, assigns into the current environment. See ‘Details’ for other possibilities.


envir     the environment to use. See ‘Details’.


The pos argument can specify the environment in which to assign the object in any of several ways: as -1 (the default), as a positive integer (the position in the search list); as the character string name of an element in the search list; or as an environment (including using sys.frame to access the currently active function calls). The envir argument is an alternative way to specify an environment, but is primarily for back compatibility.


Note that assignment to an attached list or data frame changes the attached copy and not the original object: see attach and with.


If no envir is specified, then the assignment takes place in the currently active environment.


If inherits is TRUE, enclosing environments of the supplied environment are searched until the variable x is encountered. The value is then assigned in the environment in which the variable is encountered (provided that the binding is not locked: see lockBinding: if it is, an error is signaled). If the symbol is not encountered then assignment takes place in the user's workspace (the global environment).

If inherits is FALSE, assignment takes place in the initial frame of envir, unless an existing binding is locked or there is no existing binding and the environment is locked (when an error is signaled).

Because you have not specified any of the pos, envir, or inherits arguments in your assign() call, the variable ends up being assigned into the "current" aka "active" environment. Because the assign() call takes place inside of a call to your multi.csv() function, the "current" aka "active" environment is the evaluation environment of that particular evaluation of your multi.csv() function. When the evaluation of multi.csv() completes, the environment is destroyed, and the variable is destroyed with it (note: closuring an evaluation environment prevents it from being destroyed, but your code is not doing any closuring, so that doesn't apply here).

This also helps to explain why your top-level call to assign() works: because the "current" aka "active" environment in that case is the global environment, so the variable lands right where you expect it: in your workspace environment.

You can solve the problem by passing pos=1L or envir=globalenv() in your assign() call.

(You could also solve the problem by passing inherits=T, which is akin to using the superassignment operator <<-, but this is discouraged.)


See here for more information.

Community
  • 1
  • 1
bgoldst
  • 34,190
  • 6
  • 38
  • 64
0

Sorry, but why do you need any pattern at all? Can't you just loop through all files in a folder and merge the contents into one single dataset? Try the script below and see if it gets you what you want.

setwd("C:/Users/xxx/")

file_list <- list.files()

file_list <- list.files("C:/Users/xxx/")

for (file in file_list){

  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=TRUE, sep="\t")
  }

  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=TRUE, sep="\t")
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}
ASH
  • 20,759
  • 19
  • 87
  • 200