10

With Deno being the new Node.js rival and all, the memory-safe nature of Rust has been mentioned in a lot of news articles, one particular piece stated Rust and Go are good for their memory-safe nature, as are Swift and Kotlin but the latter two are not used for systems programming that widely.

Safe Rust is the true Rust programming language. If all you do is write Safe Rust, you will never have to worry about type-safety or memory-safety. You will never endure a dangling pointer, a use-after-free, or any other kind of Undefined Behavior.

This piqued my interest into understanding if Python can be regarded as memory-safe and if yes or no, how safe or unsafe?

From the outset, the article on memory safety on Wikipedia does not even mention Python and the article on Python only mentions memory management it seems. The closest I've come to finding an answer was this one by Daniel:

The wikipedia article associates type-safe to memory-safe, meaning, that the same memory area cannot be accessed as e.g. integer and string. In this way Python is type-safe. You cannot change the type of a object implicitly.

But even this only seems to imply a connection between two aspects (using an association from Wikipedia, which again is debatable) and no definitive answer on whether Python can be regarded as memory-safe.

qedk
  • 468
  • 6
  • 18
  • Python and other high level languages abstract “memory” completely away from the programmer, so there’s virtually no code you can write that could trigger any such issues. – deceze May 23 '20 at 15:29
  • Python itself is memory safe, if you stay away from native extensions. It is also safe from memory leaks, except for a few edge cases. – CanciuCostin May 23 '20 at 15:36
  • @CanciuCostin That's a good claim but on what basis? How are you saying that it is (mostly) safe from memory leaks? – qedk May 23 '20 at 15:38
  • 2
    Python *the language* provides no way to allocate or deallocate memory, or to write directly to memory. That is *all* done in a Python *implementation* (like CPython, PyPy, etc, as well as extensions written in C or something other unsafe language). Python is memory-safe, insofar as it is completely *unaware* of how the underlying data is laid out in memory or accessed. – chepner May 23 '20 at 16:51
  • @chepner I do agree with your last statement but does it extend to the prevailing meaning of "memory-safety", that's my primary question? The point still lies in the fact it's not possible for Python to be a 100% memory-safe even with no direct access to memory, similar to how statically typed does not equate to type-safe. – qedk May 23 '20 at 17:52
  • 2
    Then *no* language is memory-safe, and the term itself means nothing. But it is useful to draw the line between "bugs caused by the coder" and "bugs due to the implementation". – chepner May 23 '20 at 18:40

1 Answers1

8

Wikipedia lists the following examples of memory safety issues:

Access errors: invalid read/write of a pointer
    Buffer overflow - out-of-bound writes can corrupt the content of adjacent objects, or internal data (like bookkeeping information for the heap) or return addresses.
    Buffer over-read - out-of-bound reads can reveal sensitive data or help attackers bypass address space layout randomization.

Python at least tries to protect against these.

    Race condition - concurrent reads/writes to shared memory

That's actually not that hard to do in languages with mutable data structures. (Advocates of functional programming and immutable data structures often use this fact as an argument in their favor).

    Invalid page fault - accessing a pointer outside the virtual memory space. A null pointer dereference will often cause an exception or program termination in most environments, but can cause corruption in operating system kernels or systems without memory protection, or when use of the null pointer involves a large or negative offset.
    Use after free - dereferencing a dangling pointer storing the address of an object that has been deleted.
Uninitialized variables - a variable that has not been assigned a value is used. It may contain an undesired or, in some languages, a corrupt value.
    Null pointer dereference - dereferencing an invalid pointer or a pointer to memory that has not been allocated
    Wild pointers arise when a pointer is used prior to initialization to some known state. They show the same erratic behaviour as dangling pointers, though they are less likely to stay undetected.

There's no real way to prevent someone from trying to access a null pointer. In C# and Java, this results in an exception. In C++, this results in undefined behavior.

Memory leak - when memory usage is not tracked or is tracked incorrectly
    Stack exhaustion - occurs when a program runs out of stack space, typically because of too deep recursion. A guard page typically halts the program, preventing memory corruption, but functions with large stack frames may bypass the page.

Memory leaks in languages like C#, Java, and Python have different meanings than they do in languages like C and C++ where you manage memory manually. In C or C++, you get a memory leak by failing to deallocate allocated memory. In a language with managed memory, you don't have to explicitly de-allocate memory, but it's still possible to do something quite similar by accidentally maintaining a reference to an object somewhere even after the object is no longer needed.

This is actually quite easy to do with things like event handlers in C# and long-lived collection classes; I've actually worked on projects where there were memory leaks in spite of the fact that we were using managed memory. In one sense, working with an environment that has managed memory can actually make these issues more dangerous because programmers can have a false sense of security. In my experience, even experienced engineers often fail to do memory profiling or write test cases to check for this (likely due to the environment giving them a false sense of security).

Stack exhaustion is quite easy to do in Python too (e.g. with infinite recursion).

    Heap exhaustion - the program tries to allocate more memory than the amount available. In some languages, this condition must be checked for manually after each allocation.

Still quite possible - I'm rather embarrassed to admit that I've personally done that in C# (although not in Python yet).

    Double free - repeated calls to free may prematurely free a new object at the same address. If the exact address has not been reused, other corruption may occur, especially in allocators that use free lists.
    Invalid free - passing an invalid address to free can corrupt the heap.
    Mismatched free - when multiple allocators are in use, attempting to free memory with a deallocation function of a different allocator[20]
    Unwanted aliasing - when the same memory location is allocated and modified twice for unrelated purposes.

Unwanted aliasing is actually quite easy to do in Python. Here's an example in Java (full disclosure: I wrote the accepted answer); you could just as easily do something quite similar in Python. The others are managed by the Python interpreter itself.

So, it would seem that memory-safety is relative. Depending on exactly what you consider a "memory-safety issue," it can actually be quite difficult to entirely prevent. High-level languages like Java, C#, and Python can prevent many of the worst of these errors, but there are other issues that are difficult or impossible to completely prevent.

  • For comparison then, why do we regard Rust as "memory-safe" and they even claim that very strongly in their documentation but Python has no similar distinguishing claim? – qedk May 23 '20 at 16:02
  • 3
    The cynical part of me wants to say that Rust needs the hype and Python doesn't. Languages generally need some compelling use case in order to get widespread adoption. Python already has a lot of well-known use cases (especially with the existence of well-known ML and data analysis libraries like Pandas, Tensorflow, and scikit-learn). – EJoshuaS - Stand with Ukraine May 23 '20 at 16:23
  • Even if Python was memory-safe, it's not possible for Python to be a absolutely memory-safe. Although I would argue the use cases of Python lie in its scripting capabilities, abundant modules and ease of understanding the syntax. If Python was that highly memory-safe, it should be a definite use-case for systems programming. – qedk May 23 '20 at 17:56