-1

I am currently having trouble in Rstudio with outputting a new dataset from multiple datasets, and not having them replace each other. I initially had issues with importing multiple datasets and was able to figure that out with

dataFiles <- Sys.glob("*.csv")

However I am having trouble actually running a for loop for these datasets at the same time without changing the actual file.

for example

for (file in dataFiles) {
  Data <- read.csv(file, sep = ",", header = T, skip = 2)
  } 

only outputs the last file that is read, and erases all previous.

Is there a way to change the output name?

Edit: My current issue is that I am not able to save the data in dataFiles, they are all replaced by the last dataset. The coding within the for loop does work.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Darwin Chang
  • 59
  • 1
  • 8
  • Welcome to Stack Overflow! Please [format your code appropriately](https://meta.stackexchange.com/a/22189/371738), this time I took care about it.. – jay.sf Jul 25 '18 at 06:00
  • Where is the part that you export graph as PDF? – Tung Jul 25 '18 at 06:01
  • `dataFiles` is just a vector of filenames, you haven't actually "imported" the files at that point. You can store all the individual datasets in a single list with `data_list <- lapply(dataFiles, function(file_name) read.csv(file_name, sep = ",", header = T, skip = 2))`, but you will have to adapt any subsequent code to work with the list, or pull the individual datasets out of it. – Marius Jul 25 '18 at 06:02
  • You can see what is going on if you add a `print(file)` as the first statement inside your for loop. Notice how all file names get printed. The kicker is that you are overwriting each imported data.frame with the next. If you mentally run this loop until the end, you will notice that only the last data.frame is left because there is no next one to overwrite it. Marius' solution is very elegant, but yours could be by initializing a list before a loop and then assigning `Data[[file]]` to it. – Roman Luštrik Jul 25 '18 at 06:22
  • possible duplicate of https://stackoverflow.com/q/11433432/4137985 – Cath Jul 25 '18 at 06:59

1 Answers1

1

We could use lapply() as suggested in comments, which will return a list.

Data <- lapply(dataFiles, read.csv)

We can give the list elements their names according to their file names in the working directory:

names(Data) <- sub("*.csv", "", dataFiles)

Or as @Axeman suggests with tools::file_path_sans_ext

names(Data) <- tools::file_path_sans_ext(dataFiles)

If we want the elements to appear in the global environment separately, we use list2env().

list2env(Data, globalenv())

If we want to wrap this altogether into a function, we could do:

importCsv <- function(x) {
  Data <- setNames(lapply(x, read.csv), sub("*.csv", "", x))
  return(list2env(Data, globalenv()))
}

Finally importCsv(Sys.glob("*.csv")) (or importCsv(dataFiles) respectively) would yield what we want.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    One can use `tools::file_path_sans_ext` to remove file extensions. I would in most case advise to keep the data.frames in a list, then you can just `lapply` again to create the plots for each set. So `Data <- setNames(lapply(dataFiles, read.csv), file_path_sans_ext(dataFiles))` should do. – Axeman Jul 25 '18 at 07:44
  • @Axeman Ah yes you're right, this simplifies a lot, thank you for pointing this out. If you don't mind, I've included your suggestions into my answer. – jay.sf Jul 25 '18 at 08:16