0

I'm trying to do some data cleaning to a file. The particular field I'm trying to clean describe what file it originally came from. Thus, there is always ".csv" at the end of the value in the field. I would like to remove this part of the value but keep the rest.

Here is an example of the field:

File Name
bagel.csv
donut.csv
hamburger.csv
carrots.csv

I would like the field to look something like this:

File Name
bagel
donut
hamburger
carrot

Is there a way to do this in R? Any assistance would be extremely appreciated.

dario
  • 6,415
  • 2
  • 12
  • 26
QMan5
  • 713
  • 1
  • 4
  • 20

3 Answers3

5

It's always better to provide a minimale reproducible example:

field <- c("aa.csv", "bb.csv", "cc.csv")

gsub("\\.csv$", "", field)

Returns:

[1] "aa" "bb" "cc"

Explanation:

We can use regex to substitute the sequence:

"." (\\.) followed by "csv" (csv) followed by end-of-line ($)

with an empty string ("")

By following the suggestion from @G5W we make sure that, since we only want to remove the extensions, we don't accidentally replace the the string if it appears in the middle of a line (As an example: In "function.csv.txt" we wouldn't want to replace the ".csv" part)

dario
  • 6,415
  • 2
  • 12
  • 26
1

You can also use dplyr

library(dplyr)

df <- data.frame(FileName = c('bagel.csv','donut.csv','hamburger.csv','carrots.csv'))

df <- df %>% mutate(FileName = gsub("\\..*","",FileName))
camnesia
  • 2,143
  • 20
  • 26
0

We can use the file_path_sans_ext from tools

tools::file_path_sans_ext(field)
#[1] "aa" "bb" "cc"

data

field <- c("aa.csv", "bb.csv", "cc.csv")
akrun
  • 874,273
  • 37
  • 540
  • 662