In linux, each thread gets only small data to be processed w/ parent's instructions. In Python, each process gets the entire bytecode. Am I right?

Question

When I create a linux thread, what exactly happens? For example, look at this code in C that allows to create a new thread:

 iret1 = pthread_create( &thread1, NULL, print_message_function, (void*) message1);

as you can see, a function must be passed. So, the thread will execute code from this function, but keep the memory shared with the main process. In Linux, a thread is what is called a 'lightweight process'. It's a process in the kernel but with memory shared with the parent process.

The thing I don't understand is: what are the contents of this thread? I mean, are the function instructions passed to the thread process? I don't think so. I think that just the arguments are passed inside the thread, and the instructions from the parent process are executed on these arguments.

I'm asking this because I'm studying python threading/multiprocessing, and as I understood, threading in python is just a user-space 'emulation' of context switches, it's not parallel in the kernel, so it suffers from performance problems, because we can't run in multiple cores.

On the other side, as I understood, python multiprocessing creates python processes (not threads), so if the python3.5 binary, for example, has 3 mb, each process created with multiprocessing will have at least 3 mb (python binary + script instructions + variables). Also, the function instructions are copied to each process. Am I right?

I'm asking this because I'm thinking about using a chatbot python code in a server, but I need multiple workers for this chat bot. Threading would be too slow, so each work must be a python process. (if I were in C, however, threading would be the perfect solution, because it's the same as a process with shared memory).

I know that stackoverflow don't like posts with multiple questions, but to ask what I want, I need to know if my assumptions are correct. If they are, then here's what I need to know:

Even if in python multiprocessing's module, function instructions are copied to each process, the only disavantage from the standard shared memory threading technique, is that I'll have extra memory for each process, right? Even though instructions are copied to each process, would it be the same (in terms of cpu, not memory) as having multiple threads with no extra instructions?

The main problem is that I'm not passing just a function, I'm passing a method of an object, so I think the entire object gets copied into each process. So if memory wasting is a problem in my server, is there a way, in python, to make workers use the function instructions from the same place, but take advantage of kernel context switching just like linux threads do?

a thread is not a process. A process can have multiple threads and without the process all threads are gone. A process works on its own. — Kami Kaze, Feb 03 '17 at 07:07
@KamiKaze yes, but I've read that a thread, in linux, is a process marked as lightweight, it was just a comment by the way — , Feb 03 '17 at 07:09
I don't think it's wrong to think of a thread as a light weight process. Each thread has its own stack. Each thread shares the resources of the process. This goes into more detail and is a good tutorial https://computing.llnl.gov/tutorials/pthreads/ — yano, Feb 03 '17 at 07:21
A thread depends on a process and can not run without it, he has given function and reports back when this is finished and has to be closed. A Process is completly free of those limitations. In a sense you can say a thread is a lightweight process, but that is not a small simplification. — Kami Kaze, Feb 03 '17 at 07:38
I agree with all of that, except you lose me at "a process is completely free of those limitations". What limitations? A thread is simply the bare-bones necessities for a stream of execution. A single-threaded process still has a single thread of execution that's given a function to execute. Not sure if a process with zero threads could exist,, but what would be the point of that? — yano, Feb 03 '17 at 07:58
If you think you need multithreading to write a *chat bot* you're doing something wrong. This is way, way,waaaayy too low level a consideration for the kind of problem you're having. Read up on asynchronous IO (twisted in python I'd say, but three might be a new cool kid around the block) and then just write the code. If you really stumble upon performance problems, profile it and then ask a question. — Voo, Feb 03 '17 at 09:24
You are comparing performance of C and python and the thread model is what has you worried? — stark, Feb 03 '17 at 12:20
@Voo sorry, I didn't understand. The chatbot is intended to be used in a server, so without threading it'd be slow because the messages would be processed in series, not parallel. Shouldn't I worry about performance? Thanks — , Feb 03 '17 at 15:44
@Guerlando and that's where asynchronous IO comes into play. The only reason you ever need concurrency is if your code is CPU limited - something that i can't imagine a run of the mill chat bot to be. — Voo, Feb 03 '17 at 16:42
@Voo what do you mean by CPU limited? For example, for each message, the chat bot python script must access things from mongo, must run some algorithms to break the words, search by their meanings and so. A chat bot that makes no I/O would be very limited — , Feb 03 '17 at 17:44

In linux, each thread gets only small data to be processed w/ parent's instructions. In Python, each process gets the entire bytecode. Am I right?

0 Answers0