1

I have this pattern:

dir1/dir2/.log.gz
dir1/dir2/a.log.gz
dir1/dir2/a.py
dir1/dir2/*.gzip.tar

I want to get filename or path and extension. e.g:

(name,extension)=(dir1/dir2/,.log.gz)
(name,extension)=(dir1/dir2/a,.log.gz)
(name,extension)=(dir1/dir2/a,.py)
(name,extension)=(dir1/dir2/,.gzip.tar)

I try:

re.findall(r'(.*).*\.?(.*)',path)

but it doesn't work perfect

  • @cᴏʟᴅsᴘᴇᴇᴅ I don't think it is a **exact** duplicate, as this questions wants to split all extensions, e.g. `.log.gz` from the name, not just the last one. – bastelflp Nov 11 '17 at 20:07

2 Answers2

4

If you just want the file's name and extension:

import os
# path = C:/Users/Me/some_file.tar.gz
temp = os.path.splitext(path)
var = (os.path.basename(temp[0]), temp[1])
print (var)
# (some_file.tar, .gz)

Its worth noting that files with "dual" extensions will need to be recursed if you want. For example, the .tar.gz is a gzip file that happens to be an archive file as well. But the current state of it is .gz.

There is more on this topic here on SO.

pstatix
  • 3,611
  • 4
  • 18
  • 40
1

General strategy: find the first '.' everything before it is the path, everything after it is the extension.

def get_path_and_extension(filename):
    index = filename.find('.')
    return filename[:index], filename[index + 1:]