1

I am relatively new to R and I am working on creating one dataframe out of many .csv files that I have in different subfolders of the same folder. Thus far, I've got this:

setwd("~/LMB/Top 6 - 2019/Juegos")

Batting.files  <- list.files(path = "~/LMB/Top 6 - 2019/Juegos/",
recursive = T, pattern=c("(statsHomeBatting.csv|statsVisitorBatting.csv)", 
                                       full.names=T))

setwd("~/LMB/Top 6 - 2019/Juegos/")

Batting.Logs <- do.call(rbind,lapply(Batting.files,read.csv, check.names = FALSE, sep = ";"))

The subfolders where I have the files look like this:

~\LMB\Top 6 - 2019\Juegos\Lanús at Ferro Feb 10

What I'd like is to have a variable in each column (let's name it Batting.Logs$Game where it shows the last part of the directory (in this case Lanús at Ferro Feb 10)

I have searched through old answers but I have not been able to get it done, so I am now unsure that it can be done with the current coding I have.

Thanks in advance!

2 Answers2

1

I think you want ?dirname() and ?basename().

dirname(path) returns the part of the path up to but excluding the last path separator, or "." if there is no path separator.

basename(path) removes all of the path up to and including the last path separator (if any).

example:

A data frame with two paths, to get the immediate parent directory, first extract the directory name then extract the basename of the result.

d <- data.frame(path = c('path/to/some/file.csv', 'path/to/another/file.csv'),
                stringsAsFactors = F)

d$file_dir <- basename(dirname(d$path))

d

#>                       path file_dir
#> 1    path/to/some/file.csv     some
#> 2 path/to/another/file.csv  another
Community
  • 1
  • 1
npjc
  • 4,134
  • 1
  • 22
  • 34
1

If you combine this answer

https://stackoverflow.com/a/44304004/3438524 (to the question: Read multiple csv data and create new columns at one time)

with dirname and basename (as already npjc posted: https://stackoverflow.com/a/54888162/3438524), this should do the trick.

Batting.files  <- list.files(path = "~/LMB/Top 6 - 2019/Juegos/",
    recursive = T, pattern=c("(statsHomeBatting.csv|statsVisitorBatting.csv)", 
                                           full.names=T))
dt.list <- sapply(file.list, fread, simplify=FALSE,data.table=F)
DT <- rbindlist(dt.list, idcol = 'folder')[, `:=` (folder = basename(dirname(folder)))]
  • Hi @user3438524 Sorry for the basic question, but I've read the help regarding both `rbindlist` and `dirname`, as I am not familiar with them, but I'm having an issue with the last line of code, where if a put my directory in `basename(dirname(*folder*))`, I get the following *Cannot use := to add columns to a null data.table (no columns), currently* On the contrary, if I simply leave `folder`, I get: *Error in dirname(folder) : object 'folder' not found* I appreciate your help – Eliseo Avramides Feb 26 '19 at 16:49
  • Hi @Eliseo Avramides, What R is telling you (when you leave folder - without asterisks) is that the variable folder do not exists (which is the name of your new data.frame column, where the path is stored). Did you left the idcol = 'folder'' in the last row? Try with str(DT) to get an idea of the structure of your object (https://stat.ethz.ch/R-manual/R-devel/library/utils/html/str.html). What is the name of your first column? – user3438524 Feb 27 '19 at 09:11
  • Thanks a lot! I struggled a little bit (not this much, it was just that I was on holiday) but I ended up nailing it like this `dt.list.Batting <- sapply(Batting.files, fread, simplify=FALSE) Batting.by.Game <- rbindlist(dt.list.Batting, idcol = "id")[, `:=` (Game = basename(dirname(id)))]` – Eliseo Avramides Mar 25 '19 at 23:17