22

I want get a list of files name of all pdf files in folder I have my python script.

Now I have this code:

files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:

e = (len(files) - 1)

The problem are this code found all files in folder(include .py) so I "fix" if my script is the last file on the folder (zzzz.py) and later I subtract the last file of the list that are my script.py.

I try many codes for only find .pdf but this the more near I am.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Xavier Villafaina
  • 605
  • 3
  • 8
  • 14

6 Answers6

23

Use the glob module:

>>> import glob
>>> glob.glob("*.pdf")
>>> ['308301003.pdf', 'Databricks-how-to-data-import.pdf', 'emr-dg.pdf', 'gfs-sosp2003.pdf']
vy32
  • 28,461
  • 37
  • 122
  • 246
19

Use glob on the directory directly to find all your pdf files:

from os import path
from glob import glob  
def find_ext(dr, ext):
    return glob(path.join(dr,"*.{}".format(ext)))

Demo:

In [2]: find_ext(".","py")
Out[2]: 
['./server.py',
 './new.py',
 './ffmpeg_split.py',
 './clean_download.py',
 './bad_script.py',
 './test.py',
 './settings.py']

If you want the option of ignoring case:

from os import path
from glob import glob
def find_ext(dr, ext, ig_case=False):
    if ig_case:
        ext =  "".join(["[{}]".format(
                ch + ch.swapcase())) for ch in ext])
    return glob(path.join(dr, "*." + ext))

Demo:

In [4]: find_ext(".","py",True)
Out[4]: 
['./server.py',
 './new.py',
 './ffmpeg_split.py',
 './clean_download.py',
 './bad_script.py',
 './test.py',
 './settings.py',
 './test.PY']
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • I believe you have one extra closing parenthesis after ch.swapcase on line 6 of your 2nd example. This is really great, thanks! – Paul Nov 08 '18 at 14:38
9

You can use endswith:

files = [f for f in os.listdir('.') if os.path.isfile(f) and f.endswith('.pdf')]
Ahsanul Haque
  • 10,676
  • 4
  • 41
  • 57
8

You simply need to filter the names of files, looking for the ones that end with ".pdf", right?

files = [f for f in os.listdir('.') if os.path.isfile(f)]
files = filter(lambda f: f.endswith(('.pdf','.PDF')), files)

Now, your files contains only the names of files ending with .pdf or .PDF :)

Maciek
  • 3,174
  • 1
  • 22
  • 26
8

Python 3.4 and later: Use pathlib

Since Python 3.4 pathlib should be used as it makes such tasks a lot simpler:

from pathlib import Path

root = "."  # take the current directory as root

for path in Path(root).glob("**/*.pdf"):
    print(path)

gives:

.pyenv/versions/3.8.10/lib/python3.8/site-packages/matplotlib/mpl-data/images/filesave.pdf
Downloads/2023-0310. Martin Thoma (1).pdf

So it goes recursively in other directories, including hidden ones. But it does NOT find foo.PDF - it is case-sensitive.

If you need it to be case insensitive:

for path in Path(root).rglob('*'):  # iterate over all
    if path.suffix.lower() == ".pdf":  # check if the path pattern matches
        print(path)

Older than Python 3.4: Use os

To get all PDF files recursively:

import os

all_files = []
for dirpath, dirnames, filenames in os.walk("."):
    for filename in [f for f in filenames if f.endswith(".pdf")]:
        all_files.append(os.path.join(dirpath, filename)
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
0

You may also use the following,

files = filter(
    lambda f: os.path.isfile(f) and f.lower().endswith(".pdf"),
    os.listdir(".")
)
file_list = list(files)

Or, in one line:

list(filter(lambda f: os.path.isfile(f) and f.lower().endswith(".md"), os.listdir(".")))

You may, or not, convert the filtered object to list using list() function.

Georgios Syngouroglou
  • 18,813
  • 9
  • 90
  • 92