0

I'm trying to compose a sed command to remove all trailing extensions from file names that have more than one in sequence separated by '.' eg:

/a/b/c.gz -> /a/b/c
/a/b/c.tar.gz -> /a/b/c rather than /a/b/c.tar   

Notice that only the filename should be truncated; dots on parent directories are to be preserved.

/a/b.c/d.tar.gz -> /a/b.c/d 

never

/a/b.c/d.tar, /a/b or /a/b/d

Therefore simply remove everything after the first '.' is not a solution.

I have a command that works OK as long as there is at least one '/' in the file name (or path rather). I'm not sure how to enhance in order to also cover single element (only filename) cases:

sed 's/^\(.*\/[^.\/]*\)[^\/]*$/\1/' list_of_filepaths.txt \
   > output_filepaths_wo_extensions.txt

So, the command above does the right thing with:

./abc.tar.gz, parent/.../abc.tar.gz, /abc.tar.gz 

It does not work for single element (only filename) cases:

abc.tar.gz

Of course, this is not surprising since it isn't matching the slash '/' anywhere.

Although adding a second sed command to deal with the '/' free case is trivial, I would like to cover all cases with a single command as it seems to me that it should be possible.

For example, I was hopping that this one would work, but it does not work for either:

sed 's/^\(.*?\/\)?\([^.\/]*\)[^\/]*$/\1\2/'

So, in this attempt of mine, the first (additional) group would capture the optional '/' containing prefix preceding the last '/'. In case of a slash free file-path that group would simply be empty.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Valentin Ruano
  • 2,726
  • 19
  • 29
  • 1
    With your shown samples. How about using like `sed 's/\..*//' Input_file`? – RavinderSingh13 Oct 07 '22 at 02:32
  • @RavinderSingh13 good try.... I guess I didn't add enough examples. that would not work with ```/a/b.c/d.e.g``` you would get ```/a/b``` instead of ```/a/b.c/d```. Will add the example. – Valentin Ruano Oct 07 '22 at 06:05
  • 1
    `.*?` is not supported in `sed`; you want `[^/]*` (and probably switch to a different delimiter so you don't have to backslash-escape all the literal slashes). A more efficient (but more verbose) solution would be to use the shell's built-in [parameter expansion](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) facilities. – tripleee Oct 07 '22 at 06:26
  • @tripleee sounds promising, could you promote your comment to a fully fledged answer? – Valentin Ruano Oct 07 '22 at 06:30
  • 1
    I added a second duplicate which covers this ground. – tripleee Oct 07 '22 at 06:34
  • @tripleee not sure that is the same situation, there are similarities but is not quite the same. Notice that the part that is engulfing the '/' coming from the left must be actually greedy (to preserve all parent dirs) and not un-greedy as in the example.... otherwise if you se how that other question is giving a straight (trivial) answer to this one could you please post a full working sed command? . – Valentin Ruano Oct 07 '22 at 06:48
  • 1
    `dir=${file%/*}; file=${file##*/}; file=$dir/${file%%.*}` – tripleee Oct 07 '22 at 06:53
  • @tripleee is there a single sed command solution as per the question or must one delegate on bash (or perl) – Valentin Ruano Oct 07 '22 at 06:56
  • You have it wrong, you are already running the shell and you don't want to delegate to `sed`. – tripleee Oct 07 '22 at 06:57
  • @tripleee perhaps is just me but I like single liners that don't rely on stuffing several commands on a line separated with ';'. In my case I'm going to get a few thousand of these file names coming as the output of another command; Pipe them into a sed (with a single command if possible) rather than loading them into environment variables within a bash file reading loop or xargs and calling into another 3 liner bash script ... is more elegant/less cumbersome. – Valentin Ruano Oct 07 '22 at 07:08
  • 1
    If your `sed` supports `-E` or `-r`, try `sed -E 's%(.*/)?([^/.]*)\..*%\1\2%'`. If not, maybe try `\?`but e.g. Mac `sed` does not support that at all; you can fake it with `sed 's%\(.*/\)*\([^./]*\)\..*%\1\2%'` – tripleee Oct 07 '22 at 07:08
  • 1
    Your question exhibits a simple `echo` but for filtering a pipe, `sed` is definitely preferred. – tripleee Oct 07 '22 at 07:09
  • @tripleee yes, the echo is just a convenient way to test if it works, and to show the problem in the simplest terms. – Valentin Ruano Oct 07 '22 at 07:10
  • I guess I can change the question to show would a sed command would we preferable. – Valentin Ruano Oct 07 '22 at 07:13
  • @tripleee fyi your bash solution above result in some edge case funky behavior. A simple filename without extension nor parent dirs result in duplicating the name with a '/' in between. Eg. 'hello' -> 'hello/hello' – Valentin Ruano Oct 07 '22 at 07:24
  • @tripleee however the sed for mac solution is working, thanks very much. – Valentin Ruano Oct 07 '22 at 07:30
  • a bash solution, using parameter expansion, is here: https://stackoverflow.com/a/965069/724039 – Luuk Oct 10 '22 at 18:41
  • @Luuk that does not work with the case ```/a/b.c/d.e.f``` as we want ```/a/b.c/d``` but what I would get is ```/a/b```; dots in parent directories should be unchanged. – Valentin Ruano Oct 14 '22 at 14:06

0 Answers0