0

I have following column in r dataframe

 file_name
 01.01.2017 -SS DPR.xlsx
 02.01.2017 -SS DPR.xlsx
 03.01.2017 -SS DPR.xlsx
 04.01.2017 -SS DPR.xlsx
 05.01.2017 -SS DPR.xlsx
 06.01.2017 -SS DPR.xlsx

I want to extract only names not the extension from above column.

 file_name
 01.01.2017 -SS DPR
 02.01.2017 -SS DPR
 03.01.2017 -SS DPR
 04.01.2017 -SS DPR
 05.01.2017 -SS DPR
 06.01.2017 -SS DPR

How can I subset the dataframe from last 4 characters in r?

Sotos
  • 51,121
  • 6
  • 32
  • 66
Neil
  • 7,937
  • 22
  • 87
  • 145

2 Answers2

3

Try using gsub:

new_file_name <- gsub("(.*)\\.\\w+", "\\1", file_name)

This solution uses the pattern (.*).\w+, which will eat and capture everything up until the terminal dot, followed by any sort of extension. This might be useful if you plan to have files other than Excel spreadsheets.

Output:

[1] "01.01.2017 -SS DPR" "02.01.2017 -SS DPR" "03.01.2017 -SS DPR"
[4] "04.01.2017 -SS DPR" "05.01.2017 -SS DPR" "06.01.2017 -SS DPR"

Demo here:

Rextester

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

we can use sub

df1$file_name <- sub("\\.xlsx", "", df1$file_name)

Or use the file_path_sans_ext from tools

df1$file_name <- tools::file_path_sans_ext(df1$file_name)
df1$file_name
#[1] "01.01.2017 -SS DPR" "02.01.2017 -SS DPR" "03.01.2017 -SS DPR" 
#[4] "04.01.2017 -SS DPR" "05.01.2017 -SS DPR"
#[6] "06.01.2017 -SS DPR"
akrun
  • 874,273
  • 37
  • 540
  • 662