1

How do I iterate over text files only within a directory? What I have thus far is;

for file in glob.glob('*'):
    f = open(file)
    text = f.read()
    f.close()

This works, however I am having to store my .py file in the same directory (folder) to get it to run, and as a result the iteration is including the .py file itself. Ideally what I want to command is either;

  1. "Look in this subdirectory/folder, and iterate over all the files in there"

OR...

  1. "Look through all files in this directory and iterate over those with .txt extension"

I'm sure I'm asking for something fairly straight forward, but I do not know how to proceed. Its probably worth me highlighting that I got the glob module through trial and error, so if this is the wrong way to go around this particular method feel free to correct me! Thanks.

cookie1986
  • 865
  • 12
  • 27
  • 1
    If you want to find all files with the .txt extension, then this link might be useful to you - https://stackoverflow.com/questions/3964681/find-all-files-in-a-directory-with-extension-txt-in-python – Wool Mar 22 '18 at 15:09

3 Answers3

3

The glob.glob function actually takes a globbing pattern as its parameter. For instance, "*.txt" while match the files whose name ends with .txt.

Here is how you can use it:

for file in glob.glob("*.txt"):
    f = open(file)
    text = f.read()
    f.close()

If however you want to exclude some specific files, say .py files, this is not directly supported by globbing's syntax, as explained here.

In that case, you'll need to get those files, and manually exclude them:

pythonFiles = glob.glob("*.py")
otherFiles = [f for f in glob.glob("*") if f not in pythonFiles]
Right leg
  • 16,080
  • 7
  • 48
  • 81
1

glob.glob() uses the same wildcard pattern matching as your standard unix-like shell. The pattern can be used to filter on extensions of course:

# this will list all ".py" files in the current directory
# (
>>> glob.glob("*.py")
['__init__.py', 'manage.py', 'fabfile.py', 'fixmig.py']

but it can also be used to explore a given path, relative:

>>> glob.glob("../*")
['../etc', '../docs', '../setup.sh', '../tools', '../project', '../bin', '../pylint.html', '../sql']

or absolute:

>>> glob.glob("/home/bruno/Bureau/mailgun/*")
['/home/bruno/Bureau/mailgun/Domains_ Verify - Mailgun.html', '/home/bruno/Bureau/mailgun/Domains_ Verify - Mailgun_files']

And you can of course do both at once:

>>> glob.glob("/home/bruno/Bureau/*.pdf")
['/home/bruno/Bureau/marvin.pdf', '/home/bruno/Bureau/24-pages.pdf', '/home/bruno/Bureau/alice-in-wonderland.pdf']
bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118
-1

The solution is very simple.

for file in glob.glob('*'):
    if not file.endswith('.txt'):
        continue
    f = open(file)
    text = f.read()
    f.close()
castle-bravo
  • 1,389
  • 15
  • 34
  • You're right, that is frustratingly simple! Thanks for taking the time to respond though. – cookie1986 Mar 22 '18 at 15:11
  • @DC_Liv No worries. As a general tip for solving problems like this, In Python3, you can use the tab completion to find all of the member functions provided by a variable. Assign a representative filename to `x`, and then press tab with only `x` on the interpreter prompt. This should list all the methods of `str`, one of which is `endswith`. Most functions in the standard library are named pretty well. If you're not sure what something does `help(x.endswith)` will usually provide an explanation. – castle-bravo Mar 22 '18 at 15:17
  • This is also a complete WTF - the whole point of glob is to avoid this kind of test and directly get names (files or folders) matching a glob expression. – bruno desthuilliers Mar 22 '18 at 15:17