12

I have got the complete path of files in a list like this:

a = ['home/robert/Documents/Workspace/datafile.xlsx', 'home/robert/Documents/Workspace/datafile2.xls', 'home/robert/Documents/Workspace/datafile3.xlsx']

what I want is to get just the file NAMES without their extensions, like:

b = ['datafile', 'datafile2', 'datafile3']

What I have tried is:

xfn = re.compile(r'(\.xls)+')
for name in a:
    fp, fb = os.path.split(fp)
    ofn = xfn.sub('', name)
    b.append(ofn)

But it results in:

b = ['datafilex', 'datafile2', 'datafile3x']
Cœur
  • 37,241
  • 25
  • 195
  • 267
MHS
  • 2,260
  • 11
  • 31
  • 45

4 Answers4

27
  1. The regex you've used is wrong. (\.xls)+ matches strings of the form .xls, .xls.xls, etc. This is why there is a remaining x in the .xlsx items. What you want is \.xls.*, i.e. a .xls followed by zero or more of any characters.

  2. You don't really need to use regex. There are specialized methods in os.path that deals with this: basename and splitext.

    >>> import os.path
    >>> os.path.basename('home/robert/Documents/Workspace/datafile.xlsx')
    'datafile.xlsx'
    >>> os.path.splitext(os.path.basename('home/robert/Documents/Workspace/datafile.xlsx'))[0]
    'datafile'
    

    so, assuming you don't really care about the .xls/.xlsx suffix, your code can be as simple as:

    >>> a = ['home/robert/Documents/Workspace/datafile.xlsx', 'home/robert/Documents/Workspace/datafile2.xls', 'home/robert/Documents/Workspace/datafile3.xlsx']
    >>> [os.path.splitext(os.path.basename(fn))[0] for fn in a]
    ['datafile', 'datafile2', 'datafile3']
    

    (also note the list comprehension.)

gene_wood
  • 1,960
  • 4
  • 26
  • 39
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • +1 for both correcting OP's incorrect approach and for providing a better solution to the problem –  Apr 06 '13 at 10:16
4

Oneliner:

>>> filename = 'file.ext'
>>> '.'.join(filename.split('.')[:-1]) if '.' in filename else filename
'file'
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
1

This is a repeat of: How to get the filename without the extension from a path in Python?

https://docs.python.org/3/library/os.path.html

In python 3 pathlib "The pathlib module offers high-level path objects." so,

>>> from pathlib import Path
>>> p = Path("/a/b/c.txt")
>>> print(p.with_suffix(''))
\a\b\c
>>> print(p.stem)
c
jjisnow
  • 1,418
  • 14
  • 5
0

Why not just use the split method?

def get_filename(path):
    """ Gets a filename (without extension) from a provided path """

    filename = path.split('/')[-1].split('.')[0]
    return filename


>>> path = '/home/robert/Documents/Workspace/datafile.xlsx'
>>> filename = get_filename(path)
>>> filename
'datafile'
Daniel Serodio
  • 4,229
  • 5
  • 37
  • 33
Amyth
  • 32,527
  • 26
  • 93
  • 135