0

I use the code aa <- read.csv("W:/project/data4try1.csv") to read a file into aa data frame. I want to create a new field (say: filename) in aa to hold the file name "data4try1" (exclude ".csv") for each rows. it looks like:

filename, var1, var2
data4try1,123,456
data4try1,001,abc
data4try1,bc,786
pascal
  • 1,036
  • 5
  • 15
TGG
  • 19
  • 4
  • I will read many files later. and then combine them into one large data frame. So, I need to know each row from which file. – TGG Feb 04 '21 at 16:07

1 Answers1

1

For a one-off, do it like this:

filepath = "W:/project/data4try1.csv""W:/project/data4try1.csv"
filename = basename(filepath)
filename_no_ext = sub(pattern = "\\.[^\\.]+$", replacement = "", filename)

aa <- read.csv(filepath)
aa$filename = filename_no_ext

Depending on your use case, you could turn this into a function:

read.csv.addpath = function(filepath, ...) {
  filename = basename(filepath)
  filename_no_ext = sub(pattern = "\\.[^\\.]+$", replacement = "", filename)

  data <- read.csv(filepath, ...)
  data$filename = filename_no_ext
  return(data)
}

You might do better to use list.files to generate a vector of all filenames and read them all at once, see How to make a list of data frames for examples of that. If you use data.table::rbindlist or dplyr::bind_rows on a named list of data frames, they can add the filename column for you based on the names of the list.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Hello Gregor, it works perfect. thank you. one more question, could you explain what's that mean of the string following the pattern: sub(pattern = "\\.[^\\.]+$", replacement = "", filename)? I would like to learn that and don't ask the kind of question again. – TGG Feb 04 '21 at 21:36
  • That's regular expressions (regex). My goal is to match the last `.` in the string, and everything after it. This is a pretty tricky regex for a beginner, but `.` is special in regex, it normally means "any single character". To make it a literal `.`, I escape it with `\\`. So `\\.` matches a `.`. Then, I want to match everything *except* a `.` until the end of the string. Inside brackets, `^` means "not", so `[^\\.]` means "not a `.`" (escaping the `.` as above). The `+` is a quantifier, which modifies the "not a `.`" to mean "one or more", and `$` anchors at the end of the string. – Gregor Thomas Feb 05 '21 at 04:44
  • All of that complication is to make sure it would work even if your file names contained `.`, so `my.great.file.csv` would turn into `my.great.file` rather than just `my`. – Gregor Thomas Feb 05 '21 at 04:46
  • Congrats on 100k, GregorThomas! – r2evans Feb 09 '21 at 02:32