9

I would like to automatically generate some sort of log of all the database changes that are made via the Django shell in the production environment.

We use schema and data migration scripts to alter the production database and they are version controlled. Therefore if we introduce a bug, it's easy to track it back. But if a developer in the team changes the database via the Django shell which then introduces an issue, at the moment we can only hope that they remember what they did or/and we can find their commands in the Python shell history.

Example. Let's imagine that the following code was executed by a developer in the team via the Python shell:

>>> tm = TeamMembership.objects.get(person=alice)
>>> tm.end_date = date(2022,1,1)
>>> tm.save()

It changes a team membership object in the database. I would like to log this somehow.

I'm aware that there are a bunch of Django packages related to audit logging, but I'm only interested in the changes that are triggered from the Django shell, and I want to log the Python code that updated the data.

So the questions I have in mind:

  • I can log the statements from IPython but how do I know which one touched the database?
  • I can listen to the pre_save signal for all model to know if data changes, but how do I know if the source was from the Python shell? How do I know what was the original Python statement?
viam0Zah
  • 25,949
  • 8
  • 77
  • 100
  • 1
    Hey, I don't think this is worthy answer so here you go: https://chase-seibert.github.io/blog/2012/10/26/django-log-shell-and-dbshell.html . This basically talks about overwritting logging in the shell by overwriting shell command, I think if you tinker a little with that shell command from article it will give you what you need – Michał Darowny Jan 12 '22 at 23:06
  • Thanks @MichałDarowny, it's helpful. I'm still missing some connection though. It's great that I can capture the history of IPython, but I'm looking for a way how to know if it changed producion data. Many times we use the shell just to debug customer tickets. That part of the history is irrelevant for now. – viam0Zah Jan 15 '22 at 14:20
  • @aaron Ideally I want to log the Python statements that triggered a change in the database. That would give me much more context of what the developer wanted to achieve than the raw SQL query that the Django ORM generated. – viam0Zah Jan 18 '22 at 14:45
  • @aaron Yes, that's what I ended up doing for the moment. However most of the time we are using the shell only to get data, not to alter it. So there's a big noise. If I could mark somehow the commands that changed data, would be great. For the moment what I can think of is grepping against the log and finding all the occurences of patterns such as `.save()`, `.delete()`, etc. – viam0Zah Jan 20 '22 at 14:47

4 Answers4

6

This solution logs all commands in the session if any database changes were made.

How to detect database changes

Wrap execute_sql of SQLInsertCompiler, SQLUpdateCompiler and SQLDeleteCompiler.

SQLDeleteCompiler.execute_sql returns a cursor wrapper.

from django.db.models.sql.compiler import SQLInsertCompiler, SQLUpdateCompiler, SQLDeleteCompiler

changed = False

def check_changed(func):
    def _func(*args, **kwargs):
        nonlocal changed
        result = func(*args, **kwargs)
        if not changed and result:
            changed = not hasattr(result, 'cursor') or bool(result.cursor.rowcount)
        return result
    return _func

SQLInsertCompiler.execute_sql = check_changed(SQLInsertCompiler.execute_sql)
SQLUpdateCompiler.execute_sql = check_changed(SQLUpdateCompiler.execute_sql)
SQLDeleteCompiler.execute_sql = check_changed(SQLDeleteCompiler.execute_sql)

How to log commands made via the Django shell

atexit.register() an exit handler that does readline.write_history_file().

import atexit
import readline

def exit_handler():
    filename = 'history.py'
    readline.write_history_file(filename)

atexit.register(exit_handler)

IPython

Check whether IPython was used by comparing HistoryAccessor.get_last_session_id().

import atexit
import io
import readline

ipython_last_session_id = None
try:
    from IPython.core.history import HistoryAccessor
except ImportError:
    pass
else:
    ha = HistoryAccessor()
    ipython_last_session_id = ha.get_last_session_id()

def exit_handler():
    filename = 'history.py'
    if ipython_last_session_id and ipython_last_session_id != ha.get_last_session_id():
        cmds = '\n'.join(cmd for _, _, cmd in ha.get_range(ha.get_last_session_id()))
        with io.open(filename, 'a', encoding='utf-8') as f:
            f.write(cmds)
            f.write('\n')
    else:
        readline.write_history_file(filename)

atexit.register(exit_handler)

Put it all together

Add the following in manage.py before execute_from_command_line(sys.argv).

if sys.argv[1] == 'shell':
    import atexit
    import io
    import readline

    from django.db.models.sql.compiler import SQLInsertCompiler, SQLUpdateCompiler, SQLDeleteCompiler

    changed = False

    def check_changed(func):
        def _func(*args, **kwargs):
            nonlocal changed
            result = func(*args, **kwargs)
            if not changed and result:
                changed = not hasattr(result, 'cursor') or bool(result.cursor.rowcount)
            return result
        return _func

    SQLInsertCompiler.execute_sql = check_changed(SQLInsertCompiler.execute_sql)
    SQLUpdateCompiler.execute_sql = check_changed(SQLUpdateCompiler.execute_sql)
    SQLDeleteCompiler.execute_sql = check_changed(SQLDeleteCompiler.execute_sql)

    ipython_last_session_id = None
    try:
        from IPython.core.history import HistoryAccessor
    except ImportError:
        pass
    else:
        ha = HistoryAccessor()
        ipython_last_session_id = ha.get_last_session_id()

    def exit_handler():
        if changed:
            filename = 'history.py'
            if ipython_last_session_id and ipython_last_session_id != ha.get_last_session_id():
                cmds = '\n'.join(cmd for _, _, cmd in ha.get_range(ha.get_last_session_id()))
                with io.open(filename, 'a', encoding='utf-8') as f:
                    f.write(cmds)
                    f.write('\n')
            else:
                readline.write_history_file(filename)

    atexit.register(exit_handler)
aaron
  • 39,695
  • 6
  • 46
  • 102
  • Thank you for the effort you put into this solution. I've just given it a try, and it's very close to what I wished to have. It's a great inspiration, I wasn't aware of all these internal APIs. Please take my bounty. – viam0Zah Jan 20 '22 at 21:31
1

I would consider something like this:

See:

https://docs.python.org/3/library/readline.html#readline.get_history_length

https://docs.python.org/3/library/readline.html#readline.get_history_item

Basically, the idea is that you could call get_history_length twice: at the beginning and end of the terminal session. That will allow you to get relevant lines of where the change took place using get_history_item. You may end up having more lines of history than what you actually need but at least there is enough context to see what's going on.

Lukasz Wiecek
  • 414
  • 2
  • 8
  • I'm not sure if I understand how django-revision plays a role here. Can you please clarify it? – viam0Zah Jan 20 '22 at 14:49
  • 1
    From what I was reading django-revision gives you version control over model instances. What I was thinking is that you could register your models to use django-revision. When you make changes using shell then manually creating new revisions, never creating revisions automatically on running app. That would allow you to quickly roll back to previous entities that were touched using shell and see what fields changed (diff: before changes, after change). – Lukasz Wiecek Jan 20 '22 at 20:43
  • The second application of django-revision here is to only log the history of commands during `atexit` when there were new revisions created. I imagine django-revision supporting something like this to avoid creating empty revisions. That would allow you to store only the relevant history parts - when model saves took place. – Lukasz Wiecek Jan 20 '22 at 20:45
  • Interesting idea, thanks! – viam0Zah Jan 20 '22 at 21:40
1

Based on the answer of aaron and the implementation of the built-in IPython magic %logstart, this is the solution we came up with in the end.

All commands of the last IPython session are logged in a history file if any of the commands triggered a database write through the Django ORM.

Here's an excerpt of the generated history file:

❯ cat ~/.python_shell_write_history
# Thu, 27 Jan 2022 16:20:28
#
# New Django shell session started
#
# Thu, 27 Jan 2022 16:20:28
from people.models import *
# Thu, 27 Jan 2022 16:20:28
p = Person.objects.first()
# Thu, 27 Jan 2022 16:20:28
p
#[Out]# <Person: Test Albero Jose Maria>
# Thu, 27 Jan 2022 16:20:28
p.email
#[Out]# 'test-albero-jose-maria@gmail.com'
# Thu, 27 Jan 2022 16:20:28
p.save()

Here's our manage.py now:

#!/usr/bin/env python
import os
import sys


def shell_audit(logfname: str) -> None:
    """If any of the Python shell commands changed the Django database during the
    session, capture all commands in a logfile for future analysis."""
    import atexit

    from django.db.models.sql.compiler import (
        SQLDeleteCompiler,
        SQLInsertCompiler,
        SQLUpdateCompiler,
    )

    changed = False

    def check_changed(func):
        def _func(*args, **kwargs):
            nonlocal changed
            result = func(*args, **kwargs)
            if not changed and result:
                changed = not hasattr(result, "cursor") or bool(result.cursor.rowcount)
            return result

        return _func

    SQLInsertCompiler.execute_sql = check_changed(SQLInsertCompiler.execute_sql)
    SQLUpdateCompiler.execute_sql = check_changed(SQLUpdateCompiler.execute_sql)
    SQLDeleteCompiler.execute_sql = check_changed(SQLDeleteCompiler.execute_sql)

    def exit_handler():
        if not changed:
            return None

        from IPython.core import getipython

        shell = getipython.get_ipython()
        if not shell:
            return None

        logger = shell.logger

        # Logic borrowed from %logstart (IPython.core.magics.logging)
        loghead = ""
        log_session_head = "#\n# New Django shell session started\n#\n"
        logmode = "append"
        log_output = True
        timestamp = True
        log_raw_input = False
        logger.logstart(logfname, loghead, logmode, log_output, timestamp, log_raw_input)

        log_write = logger.log_write
        input_hist = shell.history_manager.input_hist_parsed
        output_hist = shell.history_manager.output_hist_reprs

        log_write(log_session_head)
        for n in range(1, len(input_hist)):
            log_write(input_hist[n].rstrip() + "\n")
            if n in output_hist:
                log_write(output_hist[n], kind="output")

    atexit.register(exit_handler)


if __name__ == "__main__":
    os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.settings")
    try:
        from django.core.management import execute_from_command_line
    except ImportError:
        # The above import may fail for some other reason. Ensure that the
        # issue is really that Django is missing to avoid masking other
        # exceptions on Python 2.
        try:
            import django  # noqa: F401
        except ImportError:
            raise ImportError(
                "Couldn't import Django. Are you sure it's installed and "
                "available on your PYTHONPATH environment variable? Did you "
                "forget to activate a virtual environment?"
            )
        raise
    if sys.argv[1] == "shell":
        logfname = os.path.expanduser("~/.python_shell_write_history")
        shell_audit(logfname)
    execute_from_command_line(sys.argv)
viam0Zah
  • 25,949
  • 8
  • 77
  • 100
-1

You could use django's receiver annotation.

For example, if you want to detect any call of the save method, you could do:

from django.db.models.signals import post_save
from django.dispatch import receiver
import logging

@receiver(post_save)
def logg_save(sender, instance, **kwargs):
    logging.debug("whatever you want to log")

some more documentation for the signals

Alex SHP
  • 133
  • 9
  • 1
    How this will log the command issued by django shell? – jlandercy Jan 19 '22 at 10:07
  • Thanks for the idea. I considered this approach. I thought of using `pre_save` signal, but it's basically the same. What I miss here is exactly what @jlandercy asks too. How do I know what Python command was issued in the shell? – viam0Zah Jan 19 '22 at 17:09