You have a few options to accomplish this, each with their own advantages and disadvantages.
Argument Lists
Let's get the easy one out of the way first: passing your data structures will most definitely create a copy for each process. It sounds like this is not what you want.
Managers and Proxies
I recommend you try this method out first. The multiprocessing library supports proxy objects. These act like aliases to shared objects and are easy to use if you're dealing with native types like lists and dictionaries. This method even allows you to safely modify the shared objects without having to worry about the lock details, because the manager will take care of them. Since you mentioned lists, this may be your best bet. You can also create proxies to your own custom objects.
Global Data Structures
In some situations, an acceptable solution is to make your data structures global as an alternative to passing them as arguments. They will not be copied between processes if you only read from them. This can trip people up when they don't realize that creating a local reference to a global variable counts as writing to it because the variable's reference count must be incremented. This is how the garbage collector knows when memory can be freed: if an object's reference count is zero, then no one is using it and it can be removed. Here's a code snippet that demonstrates this:
import sys
import multiprocessing
global_var = [1, 2, 3]
def no_effect1():
print(global_var[0] + global_var[1])
print("No Effect Count: {}".format(sys.getrefcount(global_var)))
return
def no_effect2():
new_list = [i for i in global_var]
print("New List Count: {}".format(sys.getrefcount(global_var)))
def change1():
local_handle = global_var
print("Local Handle Count: {}".format(sys.getrefcount(global_var)))
def change2():
new_dict = {'item':global_var}
print("Contained Count: {}".format(sys.getrefcount(global_var)))
p_list = [multiprocessing.Process(target=no_effect1),
multiprocessing.Process(target=no_effect2),
multiprocessing.Process(target=change1),
multiprocessing.Process(target=change2)]
for p in p_list:
p.start()
for p in p_list:
p.join()
p.join()
This code produces this output:
3
No Effect Count: 2
New List Count: 2
Local Handle Count: 3
Contained Count: 3
In the no_effect1()
function, we are able to read and use data from the global structure without increasing the ref count. no_effect2()
constructs a new list from the global structures. In both cases, we are reading the globals, but not creating any local references to the same underlying memory. If you use your global data structures in this way, then you will not cause them to be copied between processes.
However, notice that in change1()
and change2()
the reference count was incremented because we bound a local variable to the same data structure. This means we have modified the global structure and it will be copied.
Shared Ctypes
If you can finesse your shared data into C arrays, you can use shared Ctypes. These are arrays (or single values) that are allocated from shared memory. You can then pass around a wrapper without the underlying data being copied.
MMAP
You can also create a shared memory-map to put data into, but it can get complicated, and I would only recommend doing it if the proxy and global options don't work for you. There is a blog post here that has a decent example.
Other Thoughts
One nitpicky point: In your question, you referred to a "VM". Since you didn't specify that you're running on a VM, I assume you're referring to the Python interpreter as a VM. Keep in mind that Python is an interpreter, and does not provide a virtual machine environment like Java. The line is certainly blurred and the correct use of the terminology is open for debate, but people generally don't refer to the Python interpreter as a VM. See the excellent answers to this SO question for a more nuanced explanation of the differences.