Count lines of code in directory using Python

Question

I have a project whose lines of code I want to count. Is it possible to count all the lines of code in the file directory containing the project by using Python?

Use [```os.walk```](https://docs.python.org/3/library/os.html#os.walk) to traverse the files and sub directories, use [```endswith```](https://docs.python.org/3/library/stdtypes.html#str.endswith) to filter the files you want to count, ```open``` each file and use ```sum(1 for line in f)``` to count the lines, aggregate all the file line counts. — wwii, Jul 23 '16 at 17:02
The built-in utility wc probably deserves mention here despite not being python. wc -l filename will return the number of lines in filename much faster than the pure python methods suggested in the answers. — John, May 28 '19 at 17:26
Here's more on [lines of code](https://blog.codinghorror.com/diseconomies-of-scale-and-lines-of-code/) in general. — djvg, Nov 01 '22 at 14:10

score 35 · Answer 1 · answered Sep 20 '17 at 18:31

Here's a function I wrote to count all lines of code in a python package and print an informative output. It will count all lines in all .py

import os

def countlines(start, lines=0, header=True, begin_start=None):
    if header:
        print('{:>10} |{:>10} | {:<20}'.format('ADDED', 'TOTAL', 'FILE'))
        print('{:->11}|{:->11}|{:->20}'.format('', '', ''))

    for thing in os.listdir(start):
        thing = os.path.join(start, thing)
        if os.path.isfile(thing):
            if thing.endswith('.py'):
                with open(thing, 'r') as f:
                    newlines = f.readlines()
                    newlines = len(newlines)
                    lines += newlines

                    if begin_start is not None:
                        reldir_of_thing = '.' + thing.replace(begin_start, '')
                    else:
                        reldir_of_thing = '.' + thing.replace(start, '')

                    print('{:>10} |{:>10} | {:<20}'.format(
                            newlines, lines, reldir_of_thing))


    for thing in os.listdir(start):
        thing = os.path.join(start, thing)
        if os.path.isdir(thing):
            lines = countlines(thing, lines, header=False, begin_start=start)

    return lines

To use it, just pass the directory you'd like to start in. For example, to count the lines of code in some package foo:

countlines(r'...\foo')

Which would output something like:

     ADDED |     TOTAL | FILE               
-----------|-----------|--------------------
        5  |        5  | .\__init__.py       
       539 |       578 | .\bar.py          
       558 |      1136 | .\baz\qux.py

`import io` and try using `io.open` instead of `open` if anyone has issues using python2 — MrMesees, Oct 07 '18 at 14:42
Ah ok, wasn't obvious ... first I wanted to know what 'ADDED' was all about? Then I realized that your 'TOTAL' is actually a cumulative count of the 'ADDED'. — Jeach, Sep 23 '19 at 01:40
You can add `newlines = list(filter(lambda l : l.replace(" ", "") not in ['\n', '\r\n'], newlines))` between `newlines = f.readlines()` and `newlines = len(newlines)` to not count empty lines. — lumpigerAffe, Aug 19 '21 at 21:06
You can also add `newlines = list(filter(lambda l : not l.startswith("#"), newlines))` to remove single line comments. — lumpigerAffe, Aug 19 '21 at 21:15

score 28 · Answer 2 · edited Jul 02 '19 at 12:42

28

pygount will display all the files in the folder, each with a count of codes lines (excluding documentation)

https://pypi.org/project/pygount/

pip install pygount

To list the results for the current directory run:

pygount ~/path_to_directory

edited Jul 02 '19 at 12:42

smichr

16,948
2
27
34

answered Jan 11 '19 at 11:05

Vilmar Rafael

381
3
3

2

This program manages to fill out my 16 GB of RAM and in turn renders my computer inoperable. – Philipp Oct 29 '20 at 11:02
5

Managed to make it work by excluding my venv's and only consider .py files with: `pygount --suffix=py --folders-to-skip=venv,venv2 .` – Philipp Oct 29 '20 at 11:21
1

pygount appears to need to run on LINUX to run as a command line tool as explained on its pypi webpages. It defintely doesn't run as described on Windows. – Jeff Winchell Jan 19 '22 at 20:52

score 24 · Answer 3 · answered Mar 16 '20 at 13:25

As an addition to the pygount answer, they just added the option --format=summary to get the total number of lines in different file types in a directory.

pygount --format=summary ./your-directory

could output somthing like

  Language     Code    %     Comment    %
-------------  ----  ------  -------  ------
XML            1668   48.56       10    0.99
Python          746   21.72      150   14.90
TeX             725   21.11       57    5.66
HTML            191    5.56        0    0.00
markdown         58    1.69        0    0.00
JSON             37    1.08        0    0.00
INI              10    0.29        0    0.00
Text              0    0.00      790   78.45
__duplicate__     0    0.00        0    0.00
-------------  ----  ------  -------  ------
Sum total      3435             1007

score 11 · Answer 4 · answered May 28 '19 at 17:22

11

This has a slight air of homework assignment :-) -- nonetheless, it's a worthwhile exercise, and Bryce93's formatting is nice. I think many would be unlikely to use Python for this given that it can be done quickly with a couple of shell commands, for example:

cat $(find . -name "*.py") | grep -E -v '^\s*$|^\s*#' | wc -l

Note that none of these solutions accounts for multiline (''') comments.

answered May 28 '19 at 17:22

JP Lodine

489
5
13

Short and good enough for me! In case someone like me does not want to include you can use `cat *.py` in the first step instead. – Alex Telon Nov 05 '19 at 20:05
This "just worked" for me on Mac OS - thank you! – Sean Azlin Nov 30 '21 at 18:19

score 3 · Accepted Answer · edited Mar 22 '22 at 14:56

3

from os import listdir
from os.path import isfile, join

def countLinesInPath(path,directory):
    count=0
    for line in open(join(directory,path), encoding="utf8"):
        count+=1
    return count

def countLines(paths,directory):
    count=0
    for path in paths:
        count=count+countLinesInPath(path,directory)
    return count

def getPaths(directory):
    return [f for f in listdir(directory) if isfile(join(directory, f))]

def countIn(directory):
    return countLines(getPaths(directory),directory)

To count all the lines of code in the files in a directory, call the "countIn" function, passing the directory as a parameter.

edited Mar 22 '22 at 14:56

djvg

11,722
5
72
103

answered Jul 23 '16 at 16:02

Daniel

1,599
1
16
19

2

Doesn't python already have a `len(file.readlines())`? And that's just one way I know of – OneCricketeer Jul 23 '16 at 16:05
Yeah I think that could work too, this doesn't take that much more code though – Daniel Jul 23 '16 at 16:06

score 3 · Answer 6 · edited Mar 22 '22 at 14:57

This is derived from Daniel's answer (though refactored enough that this won't be obvious). That one doesn't recurse through subdirectories, which is the behavior I wanted.

from os import listdir
from os.path import isfile, isdir, join

def item_line_count(path):
    if isdir(path):
        return dir_line_count(path)
    elif isfile(path):
        return len(open(path, 'rb').readlines())
    else:
        return 0

def dir_line_count(dir):
    return sum(map(lambda item: item_line_count(join(dir, item)), listdir(dir)))

djvg · Answer 7 · 2022-11-01T13:56:00.370

Here's another one, using pathlib. Lists individual (relative) file paths with line count, total number of files, and total line count.

import pathlib


class LoC(object):
    suffixes = ['.py']
    skip = ['name of dir or file to skip', ...]

    def count(self, path, init=True):
        path = pathlib.Path(path)
        if path.name in self.skip:
            print(f'skipped: {path.relative_to(self.root)}')
            return
        if init:
            self.root = path
            self.files = 0
            self.lines = 0
        if path.is_dir():
            # recursive case
            for item in path.iterdir():
                self.count(path=item, init=False)
        elif path.is_file() and path.suffix in self.suffixes:
            # base case
            with path.open(mode='r') as f:
                line_count = len(f.readlines())
            print(f'{path.relative_to(self.root)}: {line_count}')
            self.files += 1
            self.lines += line_count
        if init:
            print(f'\n{self.lines} lines in {self.files} files')

Note I omitted the __init__ method for clarity.

Usage example:

loc = LoC()
loc.count('/path/to/your/project/directory')

Connor White · Answer 8 · 2022-01-21T17:19:54.130

If you want to count how many lines are in your project, create a script inside of your project folder and paste the following into it:

import os

directory = "[project_directory]"
directory_depth = 100 # How deep you would like to go
extensions_to_consider = [".py", ".css"]  # Change to ["all"] to include all extensions
exclude_filenames = ["venv", ".idea", "__pycache__", "cache"]
skip_file_error_list = True

this_file_dir = os.path.realpath(__file__)

print("Path to ignore:", this_file_dir)
print("=====================================")
def _walk(path, depth):
    """Recursively list files and directories up to a certain depth"""
    depth -= 1
    with os.scandir(path) as p:
        for entry in p:

            skip_entry = False
            for fName in exclude_filenames:
                if entry.path.endswith(fName):
                    skip_entry = True
                    break

            if skip_entry:
                print("Skipping entry", entry.path)
                continue

            yield entry.path
            if entry.is_dir() and depth > 0:
                yield from _walk(entry.path, depth)

print("Caching entries")
files = list(_walk(directory, directory_depth))
print("=====================================")

print("Counting Lines")
file_err_list = []
line_count = 0
len_files = len(files)
for i, file_dir in enumerate(files):

    if file_dir == this_file_dir:
        print("=[Rejected file directory", file_dir, "]=")
        continue

    if not os.path.isfile(file_dir):
        continue

    skip_File = True
    for ending in extensions_to_consider:
        if file_dir.endswith(ending) or ending == "all":
            skip_File = False

    if not skip_File:
        try:
            file = open(file_dir, "r")
            local_count = 0
            for line in file:
                if line != "\n":
                    local_count += 1
            print("({:.1f}%)".format(100*i/len_files), file_dir, "|", local_count)
            line_count += local_count
            file.close()
        except:
            file_err_list.append(file_dir)
            continue
print("=====================================")
print("File Count Errors:", len(file_err_list))
if not skip_file_error_list:
    for file in file_err_list:
        print(file_err_list)

print("=====================================")
print("Total lines |", line_count)

There's probably faster and more efficient ways to do this, but this is a nice start.

Variable Information

directory is the project directory you want to be counted

directory_depth is how deep within the project infastructure i.e. a depth of 3 would mean it will only scan the following depth:

project_dir
- sub_dir
  - sub2_dir

extensions_to_consider is the file extensions to count code. If you only want to count .py files, you set extensions_to_consider = [".py"]

exclude_filenames is an array of file names (and directories) you don't want to consider the script to count code for.

skip_file_error_list is a boolean variable. If you wish to see a printout of all errors while counting, set to True. Otherwise set to False.

How To Run

Run script using the Python compiler. To run in terminal

python path_to_file.py

or

python3 path_to_file.py

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-ask). — Community, Sep 18 '21 at 05:50

score 1 · Answer 9 · answered Dec 28 '22 at 18:26

Use radon

python3 -mpip install radon

radon raw -s pkg_dir/

** Total **
    LOC: 2994
    LLOC: 1768
    SLOC: 1739
    Comments: 71
    Single comments: 29
    Multi: 818
    Blank: 408
    - Comment Stats
        (C % L): 2%
        (C % S): 4%
        (C + M % L): 30%

it will also calculate cyclomatic complexity

a@debian:~/build/clean/scte35-threefive$ radon cc  -a threefive
threefive/base.py
    M 61:4 SCTE35Base.kv_clean - A
    M 85:4 SCTE35Base.load - A
    M 95:4 SCTE35Base._chk_var - A
    C 9:0 SCTE35Base - A
    M 34:4 SCTE35Base.as_hms - A
    M 79:4 SCTE35Base._chk_nbin - A
    M 17:4 SCTE35Base.__repr__ - A
    M 20:4 SCTE35Base.as_90k - A
    M 27:4 SCTE35Base.as_ticks - A
    M 48:4 SCTE35Base.get - A
    M 54:4 SCTE35Base.get_json - A
threefive/bitn.py
    C 9:0 BitBin - A
    M 30:4 BitBin.as_int - A
    M 47:4 BitBin.as_charset - A
    C 99:0 NBin - A
    M 133:4 NBin.add_int - A
    M 170:4 NBin.reserve - A

      ..... 
246 blocks (classes, functions, methods) analyzed.
Average complexity: A (1.9024390243902438)

OverLordGoldDragon · Answer 10 · 2020-08-20T16:35:00.987

Based on Bryce93's answer, with code_only option to exclude comments, docstrings, and empty lines from line count:

import os

def countlines(rootdir, total_lines=0, header=True, begin_start=None,
               code_only=True):
    def _get_new_lines(source):
        total = len(source)
        i = 0
        while i < len(source):
            line = source[i]
            trimline = line.lstrip(" ")

            if trimline.startswith('#') or trimline == '':
                total -= 1
            elif '"""' in trimline:  # docstring begin
                if trimline.count('"""') == 2:  # docstring end on same line
                    total -= 1
                    i += 1
                    continue
                doc_start = i
                i += 1
                while '"""' not in source[i]:  # docstring end
                    i += 1
                doc_end = i
                total -= (doc_end - doc_start + 1)
            i += 1
        return total

    if header:
        print('{:>10} |{:>10} | {:<20}'.format('ADDED', 'TOTAL', 'FILE'))
        print('{:->11}|{:->11}|{:->20}'.format('', '', ''))

    for name in os.listdir(rootdir):
        file = os.path.join(rootdir, name)
        if os.path.isfile(file) and file.endswith('.py'):
            with open(file, 'r') as f:
                source = f.readlines()

            if code_only:
                new_lines = _get_new_lines(source)
            else:
                new_lines = len(source)
            total_lines += new_lines

            if begin_start is not None:
                reldir_of_file = '.' + file.replace(begin_start, '')
            else:
                reldir_of_file = '.' + file.replace(rootdir, '')

            print('{:>10} |{:>10} | {:<20}'.format(
                    new_lines, total_lines, reldir_of_file))

    for file in os.listdir(rootdir):
        file = os.path.join(rootdir, file)
        if os.path.isdir(file):
            total_lines = countlines(file, total_lines, header=False,
                                     begin_start=rootdir, code_only=code_only)
    return total_lines

score 0 · Answer 11 · answered Apr 05 '23 at 15:59

I just did a variant of @Bryce93 's response for a python + flask project(s)... ran some pivot tables on the outcome .csv file and the like (I manually marked files as 'active' downstream)... cheers

import os
import pandas as pd


def countlines(start, begin_start=None):
    global files

    for thing in os.listdir(start):
        thing = os.path.join(start, thing)
        if os.path.isfile(thing):
            if thing.endswith('.py') or thing.endswith('.html'):
                with open(thing, 'r') as f:
                    lines = f.readlines()
                    count = len([l for l in lines if not l.strip().startswith('#')])
                    functions, classes, comments = 0, 0, 0
                    if thing.endswith('.py'):
                        functions = len([
                            l for l in lines if l.strip().startswith('def ') 
                            and l.strip().endswith('):')
                        ])
                        classes = len([
                            l for l in lines if l.strip().startswith('class ') 
                            and l.strip().endswith('):')
                        ])
                        comments = len([l for l in lines if l.strip().startswith('#')])
                        language = 'python'
                    elif thing.endswith('.html'):
                        comments = len([l for l in lines if l.strip().startswith('<!--')])
                        language = 'jinja'
                    else:
                        raise Exception(thing)

                    path = str(thing)
                    folder = '/'.join(path.split(repo)[-1].split('/')[:-1])

                    files.append({
                        'path': path,
                        'repo': repo,
                        'language': language,
                        'filetype': thing.split('.')[-1],
                        'folder': folder,
                        'filename': thing.split('/')[-1],
                        'lines': count,
                        'functions': functions,
                        'classes': classes,
                        'comments': comments,
                    })

    for thing in os.listdir(start):
        thing = os.path.join(start, thing)
        if os.path.isdir(thing):
            countlines(thing, begin_start=start)


files = []
repo = '<repo1>'
countlines('<path>/<repo1>')
master = pd.DataFrame(files)
files = []
repo = '<repo2>'
countlines('<path>/<repo2>')
master = pd.concat([master, pd.DataFrame(files)], ignore_index=False, sort=False)
master['active'] = False
master.sort_values(by=['repo', 'folder', 'language', 'filename'])

master.to_csv('../<blah>.csv')

true, but there are/could also be some triple-quoted sql queries. So I'm willing to take the loss. good point @LeroyScandal — Chris, Aug 07 '23 at 21:12

score 0 · Answer 12 · answered Apr 28 '23 at 16:36

I've made a simple recursive function which prints total LOC of files in each folder and in the end it returns the total LOC of the directory you passed in initially:

import os

def find_loc(cd = os.curdir):
    listdir = os.listdir(cd)
    if len(listdir) == 0:
        return 0
    loc = 0;
    files = []
    folders = []
    next_dirs = []
    for x in listdir:
        path = os.path.join(cd, x)
        if os.path.isfile(path):
            files.append(x)
            file = open(path, 'r')
            for line in file:
                if line == '':
                    continue
                loc += 1
        elif os.path.isdir(path):
            folders.append(x)
            next_dirs.append(path)
    print(f'cd: {cd}')
    print(f'files ({len(files)}): {files}')
    print(f'dirs: {folders}')
    print(f'loc: {loc}\n')
    for next_dir in next_dirs:
        loc += find_loc(next_dir)
    return loc

find_loc()

cd param in the find_loc function is the directory from which you wish to start counting the LOC.

How does this improve on the existing answers? – DobbyTheElf May 04 '23 at 10:56 — DobbyTheElf, May 04 '23 at 10:56

Count lines of code in directory using Python

12 Answers12

Variable Information

How To Run