Remove duplicate filename extensions

Question

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz

I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz

Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)

Another question, is the instances of .gz extension multiplied a certain amount of times? — justinpage, Mar 09 '14 at 20:15
Is this a good idea? How was your `filename.gz.gz` created? `gzip` has guards against accidentally creating it. If you circumvent these via something like `gzip -c $1 > $1.gz`, buried in some script, then renaming your files will give you grief. — Joseph Quinsey, Mar 09 '14 at 21:15

score 4 · Answer 1 · answered Mar 09 '14 at 20:39

4

one way with find and awk:

find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh

Note:

I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv

see example here:

enter image description here

answered Mar 09 '14 at 20:39

Kent

189,393
32
233
301

3

Sorry for an off-topic comment but I have to ask, what did you use to record that animated GIF, `byzanz`? It looks really nice. – nwk Mar 09 '14 at 21:14
1

@nwk yes, it is byzanz, with my own wrapper. https://github.com/sk1418/myScripts/blob/master/shell/recWin.sh – Kent Mar 10 '14 at 09:43

score 0 · Answer 2 · answered Mar 09 '14 at 20:18

0

You may use

ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'

or without the regex flag

ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'

answered Mar 09 '14 at 20:18

grodzi

5,633
1
15
15

Jakub M. · Answer 3 · 2014-03-09T20:37:10.973

0

ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'

It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:

ls *.gz | ... | sh

sed is great for replacing text inside files.

edited Mar 09 '14 at 20:37

answered Mar 09 '14 at 20:31

Jakub M.

32,471
48
110
179

mklement0 · Answer 4 · 2014-03-09T22:04:49.227

find . -name "*.gz.gz" | 
 while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done

This only previews the renaming (mv) command; remove the echo to perform actual renaming.

Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)

score 0 · Answer 5 · edited Mar 09 '14 at 21:40

0

You can do that with bash string substitution:

for file in *.gz.gz; do
    mv "${file}" "${file%%.*}.gz"
done

edited Mar 09 '14 at 21:40

mklement0

382,024
64
607
775

answered Mar 09 '14 at 21:15

jaypal singh

74,723
23
102
147

score 0 · Answer 6 · edited May 23 '17 at 10:33

Using bash string substitution:

for f in *.gz.gz; do
    mv "$f" "${f%%.gz.gz*}.gz"
done

This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). ^{(Mine is not perfect, either)} Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.

If you wish to use find to process an entire directory tree, the variant is:

find . -name \*.gz.gz | \
while read f; do
    mv "$f" "${f%%.gz.gz*}.gz"
done

And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.

But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.

While this is indeed an improvement over stripping the extensions with `%%.*`, it could still strip too much, e.g. with `somefile.gz.gz.other.gz.gz`. Also, as with many other answers here, you only process files located _directly_ in the current directory, whereas the OP - due to use of `find` - processes the entire directory _subtree_. — mklement0, Mar 09 '14 at 21:50
@mklement0: Agreed. Wrt `.other.` I had already added a comment. Will add something like `find ... -print0 | ...` shortly. — Joseph Quinsey, Mar 09 '14 at 21:54

score 0 · Answer 7 · answered Mar 09 '14 at 21:53

0

This might work for you (GNU sed):

echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'

answered Mar 09 '14 at 21:53

potong

55,640
6
51
83

score 0 · Answer 8 · answered Mar 10 '14 at 04:48

0

Another way with rename:

find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +

When happy with the results remove -n (dry-run) option.

answered Mar 10 '14 at 04:48

lind

2,159
1
12
5

Remove duplicate filename extensions

8 Answers8

Linked