-1

So I have a folder with bunch of csv, I set the wd to that folder and extracted the files names:

data_dir <- "~/Desktop/All Waves Data/csv"  
setwd(data_dir)  
vecFiles <- list.files(data_dir)

all good, now the problem comes when I try to load all of the files using a loop on vecFiles:

for(fl in vecFiles) { 
fl <- read.csv(vecFiles[i], header = T, fill = T) 
}

The loop treats 'fl' as a plain string when it comes to the naming, resulting only saving the last file under 'fl' (by overwriting the previous one at each time).

I was trying to figure out why this happens but failed.
Any explanation?

Edit: Trying to achieve the following: assume you have a folder with data1.csv, data2.csv ... datan.csv, I want to load them into separate data frames named data1, data2 ..... datan

RiskyMaor
  • 308
  • 2
  • 15
  • You are overwriting the value of fn in every iteration of the loop. You should use the function sapply instead. – KenHBS Sep 27 '17 at 15:33
  • 2
    What do you want to achieve? – vaettchen Sep 27 '17 at 15:37
  • To figure out *why* it happens that way go ahead and completely ignore the for loop. For example I'll rename some things `j <- read.csv(j, header = T, fill = T)` if you look at that is there any reason why you wouldn't expect the result to be written into a variable with the literal name 'j'? There is nothing different when you stick the code into a for loop. – Dason Sep 27 '17 at 15:38
  • Iterating over i results the same: for(i in 1:length(vecFiles)) { name <- vecFiles[i] ; name <- read.csv(vecFiles[i], header = T, fill = T) } – RiskyMaor Sep 27 '17 at 15:40
  • @Dason out side of the loop it works ok: "apr16.csv" <- read.csv("apr16.csv", header = T, fill = T) will load a dataframe named after the file's name. This is what I try to achieve vaettchen, only getting out the .csv – RiskyMaor Sep 27 '17 at 15:45
  • But that isn't what you have. You have fl. Imagine this instead: `fl <- "hey"; fl <- 3` Should 3 be stored in fl or in "hey"? It's going to be stored in fl. Why would it assume that it should store the value into the string stored in fl? There are ways to do that but it doesn't make sense for it to do that without you telling it explicitly to do that. – Dason Sep 27 '17 at 15:47
  • @Dason I'm getting what you are saying, but I don't think it's the problem? See the new example in the edited post - I think you don't have this problem anymore (using name as a temporary variable) – RiskyMaor Sep 27 '17 at 16:03
  • Last line in the loop should be: string <- read.csv(string), which works fine outside of a loop. – RiskyMaor Sep 27 '17 at 16:04
  • I still don't see why you think this should assign to anything other than a variable with a name of something other than "name". It's doing exactly what you're telling it to. – Dason Sep 27 '17 at 16:10
  • Ok I got you, you are right it just overwrites 'name'. For some reason I thought that in the second line 'name' will get the assigned string from previous line. I feel now like I asked a dumb question. – RiskyMaor Sep 27 '17 at 16:18
  • Nah. Lots of people get tripped up by this issue. Nobody is going to perfectly understand everything about the language right off the bat. – Dason Sep 27 '17 at 17:04

2 Answers2

2

You want to read in all csv file from your working directory and have the locations of those files saved in vecFiles.

Why your attempt doesn't work

What you are currently doing doesn't work, because you are overwriting the object fn with the newly loaded csv file in every iteration. After all iterations have been run through, you are left with only the last overwritten fn object.

Another example to clarify why fn only contains the value of the last csv-file: If you declare fn <- "abc" in line1, and in line2 you say fn <- "def" (i.e. you overwrite fn from line1) you will obviously have the value "def" saved in fn after line2, right?

fn <- "abc"
fn <- "def"
fn
#[1] "def"

Solutions

There are two prominent ways to solve this: 1) stick with a slightly altered for-loop. 2) Use sapply().

1) The altered for loop: Create an empty list called fn, and assign the loaded csv files to the i-th element of that list in every iteration:

fn <- list()
for(i in seq_along(vecFiles)){
  fn[[i]] <- read.csv(vecFiles[i], header=T, fill=T)
}
names(fn) <- vecFiles

2) Use sapply(): sapply() is a function that R-users like to use instead of for-loops.

fn <- sapply(vecFiles, read.csv, header=T, fill=T)
names(fn) <- vecFiles

Note that you can also use lapply() instead of sapply(). The only difference is that lapply() gives you a list as output

KenHBS
  • 6,756
  • 6
  • 37
  • 52
  • "The only difference is that lapply() gives you a list as output (fn will be a list) and sapply() gives you a normal data.frame as output (fn will be a normal data.frame)" NO!! Please test with some code your assertion. This is very wrong and deserves to be deleted ASAP. Here you need to use `lapply`. – nicola Sep 27 '17 at 17:42
  • I don't want to nitpick too much, but using `sapply` here is very wrong. You seem to not fully understand the difference between `sapply` and `lapply`. In this answer shouldn't even be a mention of `sapply`. – nicola Sep 27 '17 at 17:47
  • Why do you say it's very wrong? The first comment i understand, but I don't understand why sapply is wrong? Especially because it works well, as far as I'm concerned – KenHBS Sep 27 '17 at 17:49
  • Try for instance `lapply(seq(10,50,by=10),function(x) iris[1:x,])` and `sapply(seq(10,50,by=10),function(x) iris[1:x,])` to see the difference in the output. Are you aware of what `sapply` does more than `lapply`? – nicola Sep 27 '17 at 17:51
  • It does the same as lapply, except that it returns a vector or matrix. – KenHBS Sep 27 '17 at 17:56
  • A vector or matrix? Is a `list` a vector for you? Are you sure that `sapply` returns always a vector or a matrix? – nicola Sep 27 '17 at 17:58
  • From the lapply help page: "`sapply` is a user-friendly version and wrapper of `lapply` by default returning a vector, matrix or, if `simplify = "array"`, an array if appropriate ... ". So yes, a vector or matrix. However, I checked your example lapply and sapply, and more seems be going on there than just the difference in output type.. – KenHBS Sep 27 '17 at 18:46
  • Is there a way to load it like the 1st way you presented, while not saving it directly into a list? Meaning the I'll have 4 separate data frames in the environment – RiskyMaor Sep 28 '17 at 08:38
  • If all csv file have the same columns, you could use fn <- do.call(rbind, fn) after the loop – KenHBS Sep 28 '17 at 08:41
  • Ok so there isn't really a way to take "them out" of the list, I should work with iterating over the list after (It's easier work with them separated rather to combine them into one huge matrix) – RiskyMaor Sep 28 '17 at 09:08
  • 1
    What I understand from your comment, is that you'd like to have all csv files in completely separate object, right? In most cases I encounter, I find it convenient to have them saved in one list.. – KenHBS Sep 28 '17 at 11:22
1

You're not declaring anything new when you load the file. Each time you load, it loads into fl, because of that you would only see the last file in vecFiles.

Couple of potential solutions.

First lapply:

fl <- lapply(vecFiles, function(x) read.csv(x, header=T, fill=t) )
names(fl) <- vecFiles

This will create a list of elements within fl.

Second 'rbind':

Under the assumption your data has all the same columns:

fl <- read.csv(vecFiles[1], header=T, fill=t)

for(i in vecFiles[2:length(vecFiles)]){
fl <- rbind(fl, read.csv(vecFiles[i], header=T, fill=t) )
}

Hopefully that is helpful!

Badger
  • 1,043
  • 10
  • 25
  • I'm trying to load in into different matrices named after the file name, not a big one. – RiskyMaor Sep 28 '17 at 08:20
  • Once you have the big object `fl` in this case use. `names(fl) <- vecFiles` then you can use `fl$file1` to access the data. Creating a number of named objects isn't super efficient, but I think this solution should serve your purpose. This suggestion is after using the `lapply` option. – Badger Sep 28 '17 at 12:07
  • The answer you want is here: https://stackoverflow.com/questions/19255289/for-loop-object-names-iterate But as Senor O mentions, this is not a good way to perform the task you are desiring to complete. – Badger Sep 28 '17 at 14:47