1

I try to import multiple csv files at once, but my csv files have the same exact format (variables), so when i use the code found here, i can not distinguish my datasets.

### the code i used 
temp = list.files(pattern="*.csv", full.names=TRUE)
myfiles = lapply(temp, read_csv,)

This code works fine but i can not distinguish my csv files. Is there anyway to use the same code or maybe another way so i can import multiples csv files but can see the name of the csv file attached to the datasets imported?

# this is an example of my output
 myfiles
[[1]]
# A tibble: 10 x 2
      mm     prob
   <dbl>    <dbl>
 1     0 0.0002  
 2     2 0.000300
 3     3 0.00580 
 4     4 0.007   
 5     5 0.006   
 6     8 0.02    
 7    10 0.032   
 8    12 0.015   
 9    13 0.045   
10    15 0.051   

[[2]]
# A tibble: 10 x 2
      mm    prob
   <dbl>   <dbl>
 1     1 0.002  
 2     2 0.003  
 3     3 0.00580
 4     4 0.007  
 5     5 0.006  
 6     6 0.01   
 7     7 0.03   
 8     8 0.011  
 9     9 0.02   
10    10 0.04   

[[3]]
# A tibble: 11 x 2
      mm   prob
   <dbl>  <dbl>
 1     0 0.0001
 2     4 0.0004
 3     5 0.0005
 4     8 0.007 
 5    10 0.0075
 6    15 0.03  
 7    20 0.042 
 8    23 0.05  
 9    25 0.052 
10    27 0.064 
11    30 0.071 

[[4]]
# A tibble: 10 x 2
      mm     prob
   <dbl>    <dbl>
 1     0 0.0002  
 2     2 0.000300
 3     3 0.00580 
 4     4 0.007   
 5     5 0.006   
 6     8 0.02    
 7    10 0.032   
 8    12 0.015   
 9    13 0.045   
10    15 0.051   

# my  csv files have different name g1_a.csv, g2_b.csv, g3_c.csv ...

The desired output would look something like


 myfiles
[[1]]
# name of the file attached to the dataset
#g1_a
# A tibble: 10 x 2
      mm     prob
   <dbl>    <dbl>
 1     0 0.0002  
 2     2 0.000300
 3     3 0.00580 
 4     4 0.007   
 5     5 0.006   
 6     8 0.02    
 7    10 0.032   
 8    12 0.015   
 9    13 0.045   
10    15 0.051   

[[2]]
#g2_b
# A tibble: 10 x 2
      mm    prob
   <dbl>   <dbl>
 1     1 0.002  
 2     2 0.003  
 3     3 0.00580
 4     4 0.007  
 5     5 0.006  
 6     6 0.01   
 7     7 0.03   
 8     8 0.011  
 9     9 0.02   
10    10 0.04   

[[3]]
#g3_c
# A tibble: 11 x 2
      mm   prob
   <dbl>  <dbl>
 1     0 0.0001
 2     4 0.0004
 3     5 0.0005
 4     8 0.007 
 5    10 0.0075
 6    15 0.03  
 7    20 0.042 
 8    23 0.05  
 9    25 0.052 
10    27 0.064 
11    30 0.071 

Thank you in advance for your help.

Janet
  • 225
  • 1
  • 6
  • 1
    check this other question, I think it may help: https://stackoverflow.com/questions/65865409/read-multiple-files-but-keep-track-of-which-file-is-which-dataframe-in-r/65865668#65865668 – GuedesBF Feb 05 '21 at 01:01

4 Answers4

3

Just add this line at the end of your code:

myfiles <- setNames(myfiles, basename(temp))
GordonShumway
  • 1,980
  • 13
  • 19
2

maybe you should try:

filenames = list.files(pattern=".csv", full.names=TRUE)
myfiles = lapply(filenames, read_csv)

# i added this line and it is working
myfiles = setNames(myfiles, basename(filenames))

names(myfiles)<-str_remove(names(myfiles), '.csv')

Janet
  • 225
  • 1
  • 6
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
  • I used your approach but i get this error `Error in str_replace(string, pattern, "") : argument "pattern" is missing, with no default` – Janet Feb 05 '21 at 01:18
  • There was one parenthesis missing. Fixed it – GuedesBF Feb 05 '21 at 01:22
  • Yes i know this is the weird thing! i copied your coed and it is giving me an error about `str_replace` that you are not using. ?? – Janet Feb 05 '21 at 01:25
  • Removed my first comment. str_remove() actually implicitly calls str_replace(). Try my updated code. – GuedesBF Feb 05 '21 at 01:27
  • It is working now but giving me an `NA` as names in the list. I cleared anything and tried again but still seeing `NA`. Any thoughts? – Janet Feb 05 '21 at 01:31
  • So sorry, I got some of the variable names wrong. I think it is ok now – GuedesBF Feb 05 '21 at 01:34
  • Actually i added @GordonSumway line of code to yours before using `str_remove` and it is working! – Janet Feb 05 '21 at 01:34
  • If you believe a question was adequately answered, you can accept the answer. – GuedesBF Feb 05 '21 at 01:37
  • 1
    The code is still giving an `NA` as names, but i combined both answers yours and @GordonSumwa's to run the code – Janet Feb 05 '21 at 01:39
2

There is also a package called libr that is designed for this situation exactly. It will load a directory of data sets into a list, with each list item named according to the file name. It is very easy to use. Here is an example:

library(libr)

libname(dat, "<directory>", "csv")

Your datasets will be loaded into the variable named "dat". You can then also load them into the workspace with the following command:

lib_load(dat)

The datasets will be loaded with a two-level syntax, like: dat.g1_a, dat.g2_b, dat.g3_c, etc. so it is easy to reference them.

When you are done, just unload them, and it will clean up the workspace:

lib_unload(dat)

David J. Bosak
  • 1,386
  • 12
  • 22
  • 1
    This is really amazing and fast and gives a lot of information about the imported files! Thank you very much @David! – Janet Feb 05 '21 at 02:54
2

You can use sapply with simplfy = FALSE which will give the names to the list directly.

temp = list.files(pattern="*.csv", full.names=TRUE)
result <- sapply(temp, read.csv, simplify = FALSE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213