47

I know that this question is asked many times on this website. But I found that they missed an important point: only file extension with one period was taken into consider like *.png *.mp3, but how do I deal with these filename with two period like .tar.gz.

The basic code is:

filename = '/home/lancaster/Downloads/a.ppt'
extention = filename.split('/')[-1]

But obviously, this code do not work with the file like a.tar.gz. How to deal with it? Thanks.

Alfabravo
  • 7,493
  • 6
  • 46
  • 82
Page David
  • 1,309
  • 2
  • 14
  • 25
  • By the way in your code, the variable `extension` is actually storing the complete filename, it does the same thing regardless of the type of extension. – mastazi Jun 18 '16 at 11:31
  • your code is a wrong example: it's giving you the " basename " of the path, not the extension, which is equivalent to `import os;os.path.basename('/home/lancaster/Downloads/a.ppt')` – zmo Jun 18 '16 at 11:32
  • 2
    a.tar.gz is a "gzip-compressed" tar file. So the extension of this file is `gz` and not `tar.gz`. So this question comes down to finding substrings ".tar.gz" etc, in the file names. If you see Rahul's edit, you will find that this is true. – gaganso Jun 18 '16 at 11:38
  • Hey guys, I found an more interesting thing that if I compress `a.ppt` the default filename will be `a.ppt.tar.gz`, so there will be more disturbances. Please take this into consider. – Page David Jun 18 '16 at 11:38
  • @SilentMonk But if I rename `a.tar.gz` to `a.tar(2).gz`, I cannot open it properly, so the extension is `tar.gz`. – Page David Jun 18 '16 at 11:42
  • You can open `a.tar(2).gz` just fine: `gunzip` will work fine on that file. –  Jun 18 '16 at 11:59
  • @Evert But I got `unknown suffix -- ignored` from `gunzip a.ppt.tar\(2\)`. What goes wrong? – Page David Jun 18 '16 at 12:16
  • Obviusly, you should *not* run `gunzip` on `a.ppt.tar\(2\)`, but on `a.ppt.tar\(2\).gz`... –  Jun 18 '16 at 17:42
  • @Evert sorry, misunderstood your suggestion. – Page David Jun 19 '16 at 01:11
  • Possible duplicate of [Extracting extension from filename in Python](http://stackoverflow.com/questions/541390/extracting-extension-from-filename-in-python) – Trevor Boyd Smith Mar 15 '17 at 13:13
  • @Rahul 's answer is the better way. i.e using `os.path.splitext('/path/to/your/file')` If you need a one line code you can use something like `(os.path.splitext('path/to/file.ext')[1]).split('.')[1]` – Tharanga Jun 22 '17 at 06:35

8 Answers8

100

Python 3.4

You can now use Path from pathlib. It has many features, one of them is suffix:

>>> from pathlib import Path
>>> Path('my/library/setup.py').suffix
'.py'
>>> Path('my/library.tar.gz').suffix
'.gz'
>>> Path('my/library').suffix
''

If you want to get more than one suffix, use suffixes:

>>> from pathlib import Path
>>> Path('my/library.tar.gar').suffixes
['.tar', '.gar']
>>> Path('my/library.tar.gz').suffixes
['.tar', '.gz']
>>> Path('my/library').suffixes
[]
Or Duan
  • 13,142
  • 6
  • 60
  • 65
  • Something that frustrated me: I googled for `Path` from pathlib documentation, and it told me to use .ext . This kept failing and eventually I learned of .suffix. Not sure if the first documentation was for an older version of the Path() library? – Starman Jan 22 '18 at 20:59
  • @Starman `pathlib` introduced in Python 3.4, make sure you're not looking at Python 2 docs. – Or Duan Jan 22 '18 at 21:26
26

Here is a in build module in os. More about os.path.splitext.

In [1]: from os.path import splitext
In [2]: file_name,extension = splitext('/home/lancaster/Downloads/a.ppt')
In [3]: extension
Out[1]: '.ppt'

If you have to fine the extension of .tar.gz,.tar.bz2 you have to write a function like this

from os.path import splitext
def splitext_(path):
    for ext in ['.tar.gz', '.tar.bz2']:
        if path.endswith(ext):
            return path[:-len(ext)], path[-len(ext):]
    return splitext(path)

Result

In [4]: file_name,ext = splitext_('/home/lancaster/Downloads/a.tar.gz')
In [5]: ext
Out[2]: '.tar.gz'

Edit

Generally you can use this function

from os.path import splitext
def splitext_(path):
    if len(path.split('.')) > 2:
        return path.split('.')[0],'.'.join(path.split('.')[-2:])
    return splitext(path)

It will work for all extensions.

Working on all files.

In [6]: inputs = ['a.tar.gz', 'b.tar.lzma', 'a.tar.lz', 'a.tar.lzo', 'a.tar.xz','a.png']
In [7]: for file_ in inputs:                                                                    
    file_name,extension = splitext_(file_)
    print extension
   ....:     
tar.gz
tar.lzma
tar.lz
tar.lzo
tar.xz
.png
Rahul K P
  • 15,740
  • 4
  • 35
  • 52
9

The role of a file extension is to tell the viewer (and sometimes the computer) which application to use to handle the file.

Taking your worst-case example in your comments (a.ppt.tar.gz), this is a PowerPoint file that has been tar-balled and then gzipped. So you need to use a gzip-handling program to open it. Using PowerPoint or a tarball-handling program wouldn't work. OK, a clever program that knew how to handle both .tar and .gz files could understand both operations and work with a .tar.gz file - but note that it would do that even if the extension was simply .gz.

The fact that both tar and gzip add their extensions to the original filename, rather than replace them (as zip does) is a convenience. But the base name of the gzip file is still a.ppt.tar.

John Burger
  • 3,662
  • 1
  • 13
  • 23
  • @David, this is what I tried to convey in the comment. – gaganso Jun 18 '16 at 11:48
  • Thanks for your explain, based on this idea, I think @no11 's code is simple and good enough for me, but I still think you have got the best answer. – Page David Jun 18 '16 at 12:13
  • @Rahul 's answer is the better way. i.e using `os.path.splitext('/path/to/your/file')` If you need a one line code you can use something like `(os.path.splitext('path/to/file.ext')[1]).split('.')[1]` – Tharanga Jun 22 '17 at 06:34
3

Simplest One:

import os.path
print os.path.splitext("/home/lancaster/Downloads/a.ppt")[1]
# '.ppt'
Saket Mittal
  • 3,726
  • 3
  • 29
  • 49
2

One possible way is:

  1. Slice at "." => tmp_ext = filename.split('.')[1:]

Result is a list = ['tar', 'gz']

  1. Join them together => extention = ".".join(tmp_ext)

Result is your extension as string = 'tar.gz'

Update: Example:

>>> test = "/test/test/test.tar.gz"
>>> t2 = test.split(".")[1:]
>>> t2
['tar', 'gz']
>>> ".".join(t2)
'tar.gz'
Rahul K P
  • 15,740
  • 4
  • 35
  • 52
no11
  • 62
  • 7
  • 1
    this will crash for filenames without a dot but except for that it will give what the OP asked for – Maximilian Peters Jun 18 '16 at 11:42
  • Ok. But for that case he already has a solution. So the core-question would be solved with my solution. But okay... – no11 Jun 18 '16 at 11:44
  • 1
    Although I found this solution pretty smart, it does not work for some cases. For example, some users put dots in the file name: 'my_document_on11.11.2019.docx'. – vmontazeri Apr 19 '19 at 04:06
0
>>> import os
>>> import re

>>> filename = os.path.basename('/home/lancaster/Downloads/a.ppt')  
>>> extensions = re.findall(r'\.([^.]+)', basename)
['ppt']


>>> filename = os.path.basename('/home/lancaster/Downloads/a.ppt.tar.gz')  
>>> extensions = re.findall(r'\.([^.]+)', basename)
['ppt','tar','gz']
matt
  • 1,046
  • 1
  • 13
  • 26
0
with re.findall and python 3.6

filename = '/home/Downloads/abc.ppt.tar.gz'

ext = r'\.\w{1,6}'

re.findall(f'{ext}\\b | {ext}$', filename,  re.X)

['.ppt', '.tar', '.gz']
LetzerWille
  • 5,355
  • 4
  • 23
  • 26
  • 2
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – rollstuhlfahrer Feb 24 '18 at 00:21
-1
filename = '/home/lancaster/Downloads/a.tar.gz'
extention = filename.split('/')[-1]

if '.' in extention:
  extention = extention.split('.')[-1]
  if len(extention) > 0:
    extention = '.'+extention
    print extention