4

I am trying to count all the files in a folder and all its subfolders For exemple, if my folder looks like this:

file1.txt
subfolder1/
├── file2.txt
├── subfolder2/
│   ├── file3.txt
│   ├── file4.txt
│   └── subfolder3/
│       └── file5.txt
└── file6.txt
file7.txt

I would like get the number 7.

The first thing I tried is a recursive function who count all files and calls itself for each folder

def get_file_count(directory: str) -> int:

    count = 0

    for filename in os.listdir(directory):

        file = (os.path.join(directory, filename))

        if os.path.isfile(file):
            count += 1

        elif os.path.isdir(file):
            count += get_file_count(file)

    return count

This way works but takes a lot of time for big directories.

I also remembered this post, which shows a quick way to count the total size of a folder using win32com and I wondered if this librairy also offered a way to do what I was looking for. But after searching, I only found this

fso = com.Dispatch("Scripting.FileSystemObject")
folder = fso.GetFolder(".")
size = folder.Files.Count

But this only returns the number of files in only the targeted folder (and not in its subfolders)

So, do you know if there is an optimal function in python that returns the number of files in a folder and all its subfolders?

crazycat256
  • 348
  • 2
  • 11

7 Answers7

2

IIUC, you can just do

sum(len(files) for _, _, files in os.walk('path/to/folder'))

or perhaps, to avoid the len for probably slightly better performance:

sum(1 for _, _, files in os.walk('folder_test') for f in files)
fsimonjetz
  • 5,644
  • 3
  • 5
  • 21
1

This code will reveal a count of all directory entries that are not directories (e.g., plain files, symlinks) from a specified root.

Includes timing and an actual pathname used in the test:

from glob import glob, escape
import os
import time


def get_file_count(directory: str) -> int:
    count = 0
    for filename in glob(os.path.join(escape(directory), '*')):
        if os.path.isdir(filename):
            count += get_file_count(filename)
        else:
            count += 1
    return count

start = time.perf_counter()
count = get_file_count('/Volumes/G-DRIVE Thunderbolt 3')
end = time.perf_counter()

print(count)
print(f'{end-start:.2f}s')

Output:

166231
2.38s
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
0

i used os.walk()

its my sample , i hope it'll helps you

def file_dir():
    directories = []
    res = {}
    cwd = os.getcwd()
    for root, dirs, files in os.walk(cwd):
        for file in files:
            if file.endswith(".tsv"):
                directories.append(os.path.join(root, file))
    res['dir'] = directories
    return res
0

you could also directly use the command:

find DIR_NAME -type f | wc -l

this returns the count of all files With os.system() this can be done from python.

LW42
  • 1
  • This command works on which OS? – crazycat256 May 17 '22 at 12:58
  • this one in particular is a linux command. It should work as well for Mac. Unfortunately, I have no experience using Windows, but for example [here](https://www.digitalcitizen.life/4-ways-count-number-folders-and-files-inside-folder/) you can look for the respective Windows command. – LW42 May 17 '22 at 13:14
0

Another solution using the libraries os and Path:

from pathlib import Path
from os.path import isfile

len([x for x in Path('./dir1').rglob('*') if isfile(x)])
user2314737
  • 27,088
  • 20
  • 102
  • 114
0

The proper way is to use os.walk as others have pointed out, but to give another solution which resembles your original as much as possible:

You can use os.scandir to avoid the cost of constructing the entire list, it should be substantially faster:

def get_file_count(directory: str) -> int:
    count = 0

    for entry in os.scandir(directory):
        if entry.is_file():
            count += 1

        elif entry.is_dir():
            count += get_file_count(os.path.join(directory, entry.name))

    return count
Adam.Er8
  • 12,675
  • 3
  • 26
  • 38
0

Here is another way.

import os
import re
import pandas as pd
 
def count_files(top, pattern, list_files):
  top = os.path.abspath(os.path.expanduser(top))
  res = []
  for root, dirs, files in os.walk(top):
    name_space = os.path.relpath(root, top)
    level = os.path.normpath(name_space).count(os.sep) + 1 if name_space != '.' else 0
    matches = [file for file in files if re.search(pattern, file)]
    if matches:
      if list_files:
        res.append((pattern, level, name_space, len(matches), matches))
      else:
        res.append((pattern, level, name_space, len(matches)))

  if list_files:
    df = pd.DataFrame(res, columns=['pattern', 'level', 'name_space', 'count', 'files'])
  else:
    df = pd.DataFrame(res, columns=['pattern', 'level', 'name_space', 'count'])
  return df

Consider the following directory structure

rajulocal@hogwarts ~/x/x5 % tree -a 
.
├── analysis.txt
├── count_files.ipynb
├── d1
│   ├── d2
│   │   ├── d3
│   │   │   └── f5.txt
│   │   ├── f3.txt
│   │   └── f4.txt
│   ├── f2.txt
│   └── f6.txt
├── f1.txt
├── f7.txt
└── .ipynb_checkpoints
    └── count_files-checkpoint.ipynb

4 directories, 10 files

To count the text files (i.e. those ending in .txt) in each directory

rajulocal@hogwarts ~/x/x5 % ipython
Python 3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.6.0 -- An enhanced Interactive Python. Type '?' for help.
...
In [2]: 
df = count_files("~/x/x5", "\.txt", False)
df
Out[2]: 
  pattern  level name_space  count
0   \.txt      0          .      3
1   \.txt      1         d1      2
2   \.txt      2      d1/d2      2
3   \.txt      3   d1/d2/d3      1

To see what those files are

In [3]: 
df = count_files("~/x/x5", "\.txt", True)
df
Out[3]: 
  pattern  level name_space  count                           files
0   \.txt      0          .      3  [analysis.txt, f1.txt, f7.txt]
1   \.txt      1         d1      2                [f6.txt, f2.txt]
2   \.txt      2      d1/d2      2                [f4.txt, f3.txt]
3   \.txt      3   d1/d2/d3      1                        [f5.txt]

To get the total number of files

In [4]: 
df['count'].sum()
Out[4]: 
8

To count files ending with .ipynb (ipython notebook files)

In [5]: 
df = count_files("~/x/x5", "\.ipynb", True)
df
Out[5]: 
   pattern  level          name_space  count                           files
0  \.ipynb      0                   .      1             [count_files.ipynb]
1  \.ipynb      1  .ipynb_checkpoints      1  [count_files-checkpoint.ipynb]

In [6]: 
df['count'].sum()
Out[6]: 
2

To count all the files

In [7]: 
df = count_files("~/x/x5", ".*", False)
df
Out[7]: 
  pattern  level          name_space  count
0      .*      0                   .      4
1      .*      1  .ipynb_checkpoints      1
2      .*      1                  d1      2
3      .*      2               d1/d2      2
4      .*      3            d1/d2/d3      1

In [8]: 
df['count'].sum()
Out[8]: 
10

which matches with the file count from the tree command.

Kamaraju Kusumanchi
  • 1,809
  • 19
  • 12