0

I am trying to add a column to a set of uploaded csv files, and make the contents equal the filename (or if the filename string-matches) which look like the below:

File 1 has name "A.csv"

col1    Q1          Q2          Q3          Q4
string1 0.627978806 0.946522837 0.128846019 0.063116235
string2 0.828536955 0.267770335 0.840049453 0.079681286
string3 0.985605532 0.70420001  0.240774109 0.082583466

File 2 has name "B.csv", and looks the same as above. And so on. There can be several of these, the user decides.

Multiple files are uploaded as:

files = do.call(rbind, lapply(path, FUN = data.table::fread)

This creates a merged file like below (for example if two files were uploaded):

col1    Q1          Q2          Q3          Q4
string1 0.627978806 0.946522837 0.128846019 0.063116235
string2 0.828536955 0.267770335 0.840049453 0.079681286
string3 0.985605532 0.70420001  0.240774109 0.082583466
string4 0.627978806 0.946522837 0.128846019 0.063116235
string5 0.828536955 0.267770335 0.840049453 0.079681286
string6 0.985605532 0.70420001  0.240774109 0.082583466

I want to add a column (call it col2) that contains the name of the file for each unique file.

So the goal is the below:

 col1   col2 Q1         Q2          Q3          Q4
string1 A   0.627978806 0.946522837 0.128846019 0.063116235
string2 A   0.828536955 0.267770335 0.840049453 0.079681286
string3 A   0.985605532 0.70420001  0.240774109 0.082583466
string4 B   0.627978806 0.946522837 0.128846019 0.063116235
string5 B   0.828536955 0.267770335 0.840049453 0.079681286
string6 B   0.985605532 0.70420001  0.240774109 0.082583466

I have tried the below code, but it's only working if I specify the number of files uploaded, and create separate variables for each (modified a bit from the below, but you get the idea):

# create a var for the files uploaded names
file_names = input$file_input[['name']]

# loop through the uploaded files and add column of it matches the filename
for(i in 1:num_of_files){
  if(grepl("A", file_names[i])){
     col_name = "A"

  } else if (grepl("B", file_names[i]){
     col_name = "B"
}

}
oguz ismail
  • 1
  • 16
  • 47
  • 69
themli
  • 99
  • 1
  • 8
  • 1
    Check out `data.table::rbindlist()` (specifically the `idcol` argument). There is also a chance you may need to use `sapply()` with `simplify = FALSE` because `lapply()` *may* drop names here (I cannot remember if it does in this case). If you only want the file name (and you have a full path), then you should also check out `basename()` or just use regex. Hope this helps! If you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) people will be able to give you more specific advice – Andrew Jun 29 '20 at 12:55

1 Answers1

3

Get a vector of file names (including paths to be safe).

files <- list.files("/path", "\\.csv", full.names = TRUE)

Read in the data.tables as a list, with the list names being the file name (sans path and extension).

DTs <- lapply(files, data.table::fread)
names(DTs) <- tools::file_path_sans_ext(basename(files))

Row bind the list, and make a new column based on the list names.

DTs <- data.table::rbindlist(DTs, idcol = "col2")
rpolicastro
  • 1,265
  • 8
  • 14