1

I need to write a code in python that can scan for all files inside a folder containing determined extensions like .exe, .jpg, .pdf.

Just like the linux command "ls | grep *.pdf"

I've tried to use a list containing all extensions i need and used Regular Expressions to search for it inside the folder. But i don't know what to put inside re.search()

I don't want to use something like "os" library because this script needs to work on Linux and Windows.

#!/usr/bin/python

import re

file_types = [".exe", ".jpg", ".pdf", ".png", ".txt"]

for line in file_types:
    # Do something like "ls | grep * + line"
    namefile = re.search(line, i_dont_know_what_to_put_here)
    print(namefile)

Update: Thank guys for help, i used glob library and it's works!

Peter S
  • 35
  • 1
  • 9
  • 1
    `os.path.join` and `os.walk` are crossplatform, just don't use hardcoded path, rather something like `os.path.join(os.path.dirname(__file__), "folder")`. – ipaleka Jun 27 '19 at 21:30

4 Answers4

0

Try os.listdir():

import os
file_types = ["exe", "jpg", "pdf", "png", "txt"]
files = [f for f in os.listdir('.') if os.path.isfile(f)]
# filter on file type
files = [f for f in files if f.split('.')[-1] in file_types]

In general the os and os.path module is going to be very useful to you here. You could use a regex, but unless performance is very important I wouldn't bother.

Tom Lubenow
  • 1,109
  • 7
  • 15
  • As i said i don't like to use os library. But thank for answer! – Peter S Jun 27 '19 at 21:32
  • The whole point of the os library is that it is the cross-platform way to use operating system resources like the file system. It checks if your system is Windows or Linux and uses the appropriate methods. – Tom Lubenow Jun 27 '19 at 21:34
0

My suggestion (it will work on all OS - Windows, Linux and macOS):

import os
file_types = [".exe", ".jpg", ".pdf", ".png", ".txt"]
files = [entry.path for entry in os.scandir('.') if entry.is_file() and os.path.splitext(entry.name)[1] in file_types]

or (if you want just filenames instead of full paths):

files = [entry.name for entry in os.scandir('.') if entry.is_file() and os.path.splitext(entry.name)[1] in file_types]
Jānis Š.
  • 532
  • 3
  • 14
0

Adding to the other comments here, if you still wish to use re, the way you should use it is:

re.search(<string to search for(regex)>, <string to search IN>)

so in your case lets say you have filetype = ".pdf", your code will be:

re.search(".*\{}".format(filetype), filename)

where .* means "match any character 0 or more times", and the '\' along with the ".pdf" will mean "where the name contains .pdf" (the \ is an escape char so the dot won't be translated to regex). I believe you can also add a $ at the end of the regex to say "this is the end of the string".

And as mentioned here - os.listdir works perfectly fine for both Windows & Linux.

Hope that helps.

AvitanD
  • 99
  • 10
0

You can avoid the os module by using the glob module, which can filter files by regular expression (i.e. *.py)

from glob import glob
file_types = [".exe", ".jpg", ".pdf", ".png", ".txt"]
path = "path/to/files/*{}"


fnames = [ fname for fnames in [[fname for fname in glob( path.format( ext ))] for ext in file_types] for fname in fnames]

Hard to read but it's equivalent is:

from glob import glob 
file_types = [".exe", ".jpg", ".pdf", ".png", ".txt"]
fnames = []
for ext in file_types:
    for fname in glob( path.format( ext )):
        fnames.append( fname )

EDIT: I'm not sure how this works cross platform as some other answers have considered.

EDIT2: glob may have unexpected side effects when used in windows. Getting Every File in a Windows Directory

sin tribu
  • 318
  • 1
  • 3
  • 8