116

I have a file:

ABCD.csv 

The length before the .csv is not fixed and vary in any length.

How can I extract the portion before the .csv?

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
Matrix.cursor
  • 1,171
  • 2
  • 7
  • 4

9 Answers9

202

There's a built in file_path_sans_ext from the standard install tools package that grabs the file without the extension.

tools::file_path_sans_ext("ABCD.csv")
## [1] "ABCD"
Jason V
  • 1,077
  • 7
  • 14
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • 10
    Anyone looking for more details on this and similar functions, take a look at `?tools::file_ext` – thelatemail Mar 18 '15 at 04:19
  • 6
    After tested, I think it's better to put file path in `basename()` as `file_path_sans_ext(basename(filepath))`. – ah bon Dec 23 '21 at 07:30
49

basename will also remove the path leading to the file. And with this regex, any extension will be removed.

filepath <- "d:/Some Dir/ABCD.csv"
sub(pattern = "(.*)\\..*$", replacement = "\\1", basename(filepath))

# [1] "ABCD"

Or, using file_path_sans_ext as Tyler Rinker suggested:

file_path_sans_ext(basename(filepath))

# [1] "ABCD"
Jason V
  • 1,077
  • 7
  • 14
  • 2
    Special case: a file having "several extensions", like "ABCD.txt.csv" (yeah, it happens), then just add a '?' to make the expression non-greedy: `sub(pattern = "(.*?)\\..*$", replacement = "\\1", basename(filepath))` – Jason V Nov 14 '15 at 03:08
29

You can use sub or substr

sub('\\.csv$', '', str1) 
#[1] "ABCD"

or

substr(str1, 1, nchar(str1)-4)
#[1] "ABCD"

Using the 'file_path' from @JasonV's post

sub('\\..*$', '', basename(filepath))
#[1] "ABCD"

Or

library(stringr)
str_extract(filepath,  perl('(?<=[/])([^/]+)(?=\\.[^.]+)'))
#[1] "ABCD"

data

str1 <- 'ABCD.csv'
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Yes, it would remove too. Why do you need the `.` after the `\\.` Could that be also a `.` literally i.e. `foo..` – akrun Sep 17 '20 at 15:18
  • You are right of course, this was a typo. My bad. Now I cannot edit this anymore. – stephanmg Sep 17 '20 at 15:25
  • How about this: sub('\\.[^\\.]*$', '', "foo.txt")? This will remove the very last extension. – stephanmg Sep 17 '20 at 15:25
  • 1
    @stephanmg There could be edge cases like `foo.` Not sure what to do with those – akrun Sep 17 '20 at 15:26
  • Yes, I'd assume this is not sound input. Before applying your command, one could check this (Or even, we could modify the regex?) – stephanmg Sep 17 '20 at 16:05
  • 1
    @stephanmgI would say that regex would be more custom case i.e. it cannot be applied to all the general cases. Suppose if the OP mentioin that he/she will only have `.` at the end and there are no other cases, this would work – akrun Sep 17 '20 at 16:07
  • 1
    Okay, I think this is fine then. – stephanmg Sep 18 '20 at 07:22
11

fs::path_ext_remove() "removes the last extension and returns the rest of the path".

fs::path_ext_remove(c("ABCD.csv", "foo.bar.baz.txt", "d:/Some Dir/ABCD.csv"))

# Produces: [1] "ABCD"             "foo.bar.baz"      "D:/Some Dir/ABCD"
wibeasley
  • 5,000
  • 3
  • 34
  • 62
6

You can try this also:

data <- "ABCD.csv"
gsub(pattern = "\\.csv$", "", data)

#[1] "ABCD"

This will be helpful in case of list of files as well, say

data <- list.files(pattern="\\.csv$") , using the code will remove extension of all the files in the list.

Agaz Wani
  • 5,514
  • 8
  • 42
  • 62
6

If you have filenames with multiple (possible extensions) and you want to strip off only the last extension, you can try the following.

Consider the filename foo.bar.baz.txt this

sub('\\..[^\\.]*$', '', "foo.bar.baz.txt")

will leave you with foo.bar.baz.

stephanmg
  • 746
  • 6
  • 17
2

Here is an implementation that works for compression and multiple files:

remove.file_ext <- function(path, basename = FALSE) {
  out <- c()
  for (p in path) {
    fext <- file_ext(path)
    compressions <- c("gzip", "gz", "bgz", "zip")
    areCompressed <- fext %in% compressions
    if (areCompressed) {
      ext <- file_ext(file_path_sans_ext(path, compression = FALSE))
      regex <- paste0("*\\.",ext,"\\.", fext,"$")
    } else {
      regex <- paste0("*\\.",fext,"$")
    }
    new <- gsub(pattern = regex, "", path)
    out <- c(out, new)
  }
  return(ifelse(basename, basename(out), out))
}
Roler
  • 31
  • 6
2

Loading the library needed :

> library(stringr)

Extracting all the matches from the regex:

> str_match("ABCD.csv", "(.*)\\..*$")
     [,1]       [,2]  
[1,] "ABCD.csv" "ABCD"

Returning only the second part of the result, which corresponds to the group matching the file name:

> str_match("ABCD.csv", "(.*)\\..*$")[,2]
[1] "ABCD"

EDIT for @U-10-Forward:

It is basically the same principle as the other answer. Just that I found this solution more robust.

Regex wise it means:

  • () = group

  • .* = any single character except the newline character any number of time

  • // is escape notation, thus //. means literally "."

  • .* = any characters any number of time again

  • $ means should be at the end of the input string

The logic is then that it will return the group preceding a "." followed by a group of characters at the end of the string (which equals the file extension in this case).

SJGD
  • 132
  • 1
  • 2
  • 7
2

The above answers are great, but I was interested in which was fastest for dealing with millions of paths at once. It seems that using sub via this SO question is the fastest for getting the filename out of the path. and then comparing three of the methods above, using tools::file_path_sans_ext is faster.

library(fs)
library(stringr)
library(microbenchmark)

files<-paste0("http://some/ppath/to/som/cool/file/",1:1000,".flac")

microbenchmark(
    fs::path_ext_remove(sub(".*/", "", files)),
    tools::file_path_sans_ext(sub(".*/", "", files)),
    str_extract(files,  '(?<=[/])([^/]+)(?=\\.[^.]+)')
    
) 
Unit: milliseconds
                                                expr     min       lq      mean   median      uq     max neval
          fs::path_ext_remove(sub(".*/", "", files)) 10.6273 10.98940 11.323063 11.20500 11.4992 14.5834   100
    tools::file_path_sans_ext(sub(".*/", "", files))  1.3717  1.44260  1.532092  1.48560  1.5588  2.4806   100
 str_extract(files, "(?<=[/])([^/]+)(?=\\\\.[^.]+)")  7.4197  7.62875  7.985206  7.88835  8.2311  9.4107   100
Abram Fleishman
  • 161
  • 1
  • 6