0

I have several .csv files of data stored in a directory, and I need to import all of them into R.

Each .csv has two columns when imported into R. However, the 1001st row needs to be stored as a separate variable for each of the .csv files (it corresponds to an expected value which was stored here during the simulation; I want it to be outside of the main data).

So far I have the following code to import my .csv files as matrices.

#Load all .csv in directory into list
dataFiles <- list.files(pattern="*.csv")

for(i in dataFiles) {
   #read all of the csv files
   name <- gsub("-",".",i)
   name <- gsub(".csv","",name)  
   i <- paste(".\\",i,sep="")
   assign(name,read.csv(i, header=T))
}

This produces several matrices with the naming convention "sim_data_L_mu" where L and mu are parameters from the simulation. How can I remove the 1001st row (which has a number in the first column, and the second column is null) from each matrix and store it as a variable named "sim_data_L_mu_EV"? The main problem I have is that I do not know how to call all of the newly created matrices in my for loop.

wzbillings
  • 364
  • 4
  • 14
  • 1
    [Use lists, it will be much simpler. See here for examples.](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207). – Gregor Thomas Jun 11 '18 at 13:56
  • You could make a list wherein each element is another list comprising of a data.frame and a scalar value. Using `lapply` instead of a `for` loop would be much more convenient and "R way of things" – Gautam Jun 11 '18 at 19:53

1 Answers1

0

Couldn't post long code in comments so am writing here:

# Use dialog to select folder
# Full names are required to access files that are not in the current working directory 
file_list <- list.files(path = choose.dir(), pattern = "*.csv", full.names = T)
big_list <- lapply(file_list, function(z){
  df <- read.csv(z)
  scalar <- df[1000,1]
  return(list(df, scalar))
})

To access the scalar value from the third file, you can use

big_list[[3]][2]

The elements in big_list follow the order of file_list so you always know which file the data comes from.

If you use data.table::fread() instead of read.csv, you can play around with assigning column names, selecting which rows/columns to read etc. It's also considerably faster for large datafiles.

Hope this helps!

Gautam
  • 2,597
  • 1
  • 28
  • 51