2

I was wondering if anyone knew of an implementation/library I could use to perform a deep copy of a PyObject without using the Python API.

I'd prefer something in C (as I currently use, and am somewhat familiar with CFFI), but anything (no matter the language - e.g. RUST), would be greatly appreciated.

The reason for this, is that I'm attempting to perform an analysis of Python variables (for a real-time Python debug library), but don't want to perform the analysis during the execution of the program being analyzed (as that would greatly impact program performance).

If I could analyze the variables post-execution (but before program termination), that would be tremendously helpful. In order to do that, I'd need to save the variables in some other thread (preferably a C program which doesn't require the GIL - so that the main Python program can continue execution uninterrupted).

I personally don't think there's anything out there, as I've looked already, but thought it might be worth a shot asking someone on Stack Overflow.

Thank you.

BL12345
  • 39
  • 3

1 Answers1

2

In C there is memcpy for making deep copies of structs and structs are the closest equivalent to an object in OOP that you can get. So if you can get the size of the Python object in memory and its memory location you can use memcpy() to copy it deeply (Deep copying array in C... malloc?, Making a deep copy of a struct...making a shallow copy of a struct). You can do this from within Python by either writing an additional module (https://docs.python.org/3/extending/extending.html) or mechanisms like cython (https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html)

However if the Python object contains pointers to substructures memcpy will not produce a deep copy (C++ deep copying with objects) In this case you have to allocate memory for the copy of the Python object and copy each substructure manually also

https://agiledeveloper.com/articles/cloning072002.htm - Why Copying an Object is a terrible thing to do?

Update :

As stated in the comments using C memcpy is not an optimal solution because of substructures that are possibly implemented as pointers. So maybe try to use Python's copy module ( https://pymotw.com/2/copy/ ) or analyze its source code and adapt it for your needs

ralf htp
  • 9,149
  • 4
  • 22
  • 34
  • "if the Python object contains pointers to substructures memcpy will not produce a deep copy" -- doesn't a `PyObject` *always* contain pointers? – John Coleman Apr 05 '21 at 11:46
  • In Python any passing is done by assignment : https://realpython.com/python-pass-by-reference/#understanding-assignment-in-python there are no inherent pointers https://realpython.com/pointers-in-python/ , https://realpython.com/python-memory-management/ – ralf htp Apr 05 '21 at 11:55
  • Additionally, there's the issue of knowing how much data to copy. In CPython, objects of many different types are pointed to by pointers of type `PyObject *`. Obviously, Python knows how to figure out the actual type of the pointed-to object, so other code could, too, but that information needs to be determined before even a shallow copy can be performed. – John Bollinger Apr 05 '21 at 11:55
  • Deeper insight in the implementation of Python objects is in https://docs.python.org/3/reference/datamodel.html and also in https://stackoverflow.com/questions/7666873/cython-and-deepcopy-woes-with-referenced-methods-functions-any-alternative-id – ralf htp Apr 05 '21 at 12:00
  • @ralfhtp OP was asking how to make a deep copy of a `PyObject` -- which is (I believe) a C struct which contains pointers. I don't know what OP is trying to do, but I doubt that `memcpy` is adequate for the task. CPython uses pointers so heavily that the shallow nature of `memcpy` would likely introduce bugs sooner or later. – John Coleman Apr 05 '21 at 12:01
  • I keep this answer for now because it deeper analyses the issue, add your own improved and accurate answer – ralf htp Apr 05 '21 at 12:05
  • @ralfhtp I have no better answer. OP seems to want to do something very ambitious -- somehow monitor a Python process without actually controlling the GIL, all the while taking deep snapshots of the variables. If any C solution is possible at all, `memcpy` would surely be part of it. – John Coleman Apr 05 '21 at 12:13
  • Yes maybe someone has a solution to this, we wait... – ralf htp Apr 05 '21 at 12:16