Enforcing dictionary access via .get(...) to prevent KeyErrors

Question

I frequently run into KeyErrors triggered by situations like

d: dict[str, int] = {"a": 1}
foo = "bar"
...
d[foo]  # boom

Our team uses mypy for type checking. If it were possible to automatically detect the use of d[foo], and disallow it in favour of d.get(foo), the result would have an explicit type of Optional[int], which would prevent the author from forgetting about the edge case of the key not existing.

Are there tools that are able to detect and warn about such square-bracket access? Are there situations I'm forgetting where .get(...) doesn't work as an alternative?

Why would you _want_ that? Better to crash with a `KeyError` than to stumble along and then get some obscure `TypeError` about `NoneType` later on... — wim, Nov 18 '22 at 21:07
@wim If you’re using mypy, it’ll force you to check that the value isn’t None. — Samwise, Nov 18 '22 at 21:09
In answer to OP, this sounds like a good job for pylint. I doubt there’s a built in rule for it, but it’s the kind of thing that’s very easy to write a custom rule for. — Samwise, Nov 18 '22 at 21:10
Another thing to consider might be using TypedDict or dataclass for cases where the keys are statically knowable. That’ll give you the same type safety you’re looking for but much less awkwardly. — Samwise, Nov 18 '22 at 21:14
*Type* checking is something you do at compile time, *value* checking is something you do at runtime, unless you have some very strict restrictions on value that cannot be avoided, in which case something like an `Enum` may be what you want. You could also create a dict that has a `__getitem__` that always returns a value, even if the key isn't there (which is effectively what you're asking), but that will get very confusing to users fast, unless it's clear that your class is not a `dict` at all. — Grismar, Nov 18 '22 at 21:21
Another option is to use a class rather than dictionary when the keys are known. — Barmar, Nov 18 '22 at 21:54
@Samwise yes, TypedDicts are great, and we encourage their use, but it's easy to forget to use them. You also still end up interfacing with lots of dicts from external sources (libraries, JSONs, etc.) that aren't typed. I'm looking for a tool that can highlight all the places in our code where this happens. — sk29910, Nov 18 '22 at 21:58
Yes, as I said what you want is a linter (which you'd run as part of your commit/review process along with mypy) that will warn you when you're using a regular dict and/or using its subscript operator. — Samwise, Nov 18 '22 at 22:49
Well, you can use your own typeshed fork and modify it, replacing return type of `dict.__getitem__` with `typing.NoReturn`. This will mark all such usages, but in very weird way: by reporting lines following that call as unreachable (you need `--warn-unreachable` or corresponding config entry). This is a black hack, though, and I'd never advice to use it in production - but you may safely use it to identify all such usages in existing codebase (or set up as a separate CI step, not overlapping with usual typechecking, to reduce astonishment from such `mypy` run results). — STerliakov, Nov 18 '22 at 22:49

score 0 · Answer 1 · answered May 03 '23 at 01:02

You can definitely do this with Semgrep, and even suggest the fix:

rules:
- id: use .get
  patterns:
    - pattern-inside: |
        $D = {...}
        ...
    - pattern: $D[$K]
  fix: $D.get($K)
  message: Match found
  languages: [python]
  severity: WARNING

Playground link: https://semgrep.dev/s/B3o5

Enforcing dictionary access via .get(...) to prevent KeyErrors

1 Answers1