30

I'm attempting use caffe and python to do real-time image classification. I'm using OpenCV to stream from my webcam in one process, and in a separate process, using caffe to perform image classification on the frames pulled from the webcam. Then I'm passing the result of the classification back to the main thread to caption the webcam stream.

The problem is that even though I have an NVIDIA GPU and am performing the caffe predictions on the GPU, the main thread gets slown down. Normally without doing any predictions, my webcam stream runs at 30 fps; however, with the predictions, my webcam stream gets at best 15 fps.

I've verified that caffe is indeed using the GPU when performing the predictions, and that my GPU or GPU memory is not maxing out. I've also verified that my CPU cores are not getting maxed out at any point during the program. I'm wondering if I am doing something wrong or if there is no way to keep these 2 processes truly separate. Any advice is appreciated. Here is my code for reference

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue
        #other initialization stuff

    def run(self):
        caffe.set_mode_gpu()
        caffe.set_device(0)
        #Load caffe net -- code omitted 
        while True:
            image = self.task_queue.get()
            #crop image -- code omitted
            text = net.predict(image)
            self.result_queue.put(text)

        return

import cv2
import caffe
import multiprocessing
import Queue 

tasks = multiprocessing.Queue()
results = multiprocessing.Queue()
consumer = Consumer(tasks,results)
consumer.start()

#Creating window and starting video capturer from camera
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
    rval, frame = vc.read()
else:
    rval = False
frame_copy[:] = frame
task_empty = True
while rval:
    if task_empty:
       tasks.put(frame_copy)
       task_empty = False
    if not results.empty():
       text = results.get()
       #Add text to frame
       cv2.putText(frame,text)
       task_empty = True

    #Showing the frame with all the applied modifications
    cv2.imshow("preview", frame)

    #Getting next frame from camera
    rval, frame = vc.read()
    frame_copy[:] = frame
    #Getting keyboard input 
    key = cv2.waitKey(1)
    #exit on ESC
    if key == 27:
        break

I am pretty sure it is the caffe prediction slowing everything down, because when I comment out the prediction and pass dummy text back and forth between the processes, I get 30 fps again.

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue
        #other initialization stuff

    def run(self):
        caffe.set_mode_gpu()
        caffe.set_device(0)
        #Load caffe net -- code omitted
        while True:
            image = self.task_queue.get()
            #crop image -- code omitted
            #text = net.predict(image)
            text = "dummy text"
            self.result_queue.put(text)

        return

import cv2
import caffe
import multiprocessing
import Queue 

tasks = multiprocessing.Queue()
results = multiprocessing.Queue()
consumer = Consumer(tasks,results)
consumer.start()

#Creating window and starting video capturer from camera
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
    rval, frame = vc.read()
else:
    rval = False
frame_copy[:] = frame
task_empty = True
while rval:
    if task_empty:
       tasks.put(frame_copy)
       task_empty = False
    if not results.empty():
       text = results.get()
       #Add text to frame
       cv2.putText(frame,text)
       task_empty = True

    #Showing the frame with all the applied modifications
    cv2.imshow("preview", frame)

    #Getting next frame from camera
    rval, frame = vc.read()
    frame_copy[:] = frame
    #Getting keyboard input 
    key = cv2.waitKey(1)
    #exit on ESC
    if key == 27:
        break
Shai
  • 111,146
  • 38
  • 238
  • 371
user3543300
  • 499
  • 2
  • 9
  • 27
  • Did you time the various blocks of your code? Data transfer between CPU and GPU may account for significant overheads. – Harsh Wardhan Sep 16 '16 at 07:57
  • How would I know if the transfer is what's slowing it down? There's no explicit code that transfers from GPU to CPU here – user3543300 Sep 16 '16 at 17:58
  • Did you try replacing `net.predict(image)` with some code that uses lots of CPU for about the same amount of time as a prediction? E.g., `for i in range(10000000): pass` takes about 0.22s on my machine. For my machine and webcam, your code ran at 30 fps this way. – Ulrich Stern Sep 29 '16 at 17:01
  • But the prediction should be occurring on the GPU right? So why would increasing the CPU usage help in this case? Bit confused – user3543300 Sep 29 '16 at 19:39
  • Yes, the prediction should be occurring on the GPU, and you could also use, e.g., `time.sleep(.15)`. But for a test of your "process communication scheme," why not stress the CPU? And the prediction _may_ cause decent CPU load in addition to GPU, especially for single-frame prediction. – Ulrich Stern Sep 30 '16 at 05:58
  • I've done a test with `time.sleep(1)` and I didn't experience a slowdown in my program. I've ran caffe in CPU_ONLY mode and have noticed a more severe slowdown. I'm not sure as to why a single frame prediction would stress the CPU that much though. – user3543300 Sep 30 '16 at 06:06
  • Doesn't computation happen on the GPU? – user3543300 Sep 30 '16 at 06:19
  • 2
    I have used cuda-convnet for _non-real-time_ video analysis and had decent CPU and GPU load. I have not analyzed the CPU usage as to what part was me and what was cuda-convnet, though. I had used batches, though, and intuitively single frames may cause more CPU overhead. But my intuition may be wrong. :) – Ulrich Stern Sep 30 '16 at 06:22
  • I might be being very naive here, but which GPU do you have? Having a GPU doesn't guarantee fast predictions, in fact, a bad GPU might be slower than the CPU. If you want real time predictions, you will need a pretty damn good GPU (e.g. a TitanX). Can you just time how long does a `net.predict(image)` take? – Imanol Luengo Oct 01 '16 at 07:58
  • I have a GeForce 940MX. There is definitely a speed improvement over running Caffe in CPU_ONLY mode. A prediction takes around .15 seconds. If I used a deeper network, it can take up to 2 seconds. I'm ok with lag between predictions and what the webcam stream is displaying, but what's going on is the prediction computation is slowing down the very act of displaying my webcam stream even though they shouldn't be related. – user3543300 Oct 02 '16 at 06:35
  • Have you timed `cv2.waitKey(1)` when your code gets 15 fps? There is some "magic" (event handling) happening in this call (see [docs](http://docs.opencv.org/2.4/modules/highgui/doc/user_interface.html#waitkey)), and I think I once ran into a strange interaction. If this is not it, you could time other parts of your loop (e.g., `vc.read()`) to narrow down what statement may cause the slowdown to 15 fps. – Ulrich Stern Oct 05 '16 at 20:28
  • It seems to be taking anywhere from 3 to 50 ms. Is there a solution to this? – user3543300 Oct 06 '16 at 01:20
  • It's consistently 3 ms without the image prediction when the code runs at 30 fps. – user3543300 Oct 06 '16 at 01:24
  • I never delved into the details. This may be OS and OpenCV version dependent. The strange interaction I think I saw was on Ubuntu. OpenCV changed a decent amount between 2.4 and 3.X, so this may be worth a quick try. (I like 3.X in general, but reverted to 2.4 for one of my projects since 3.X messed up writing of MJPEG-encoded AVIs!) – Ulrich Stern Oct 06 '16 at 04:53
  • I'm running ubuntu 16.04 and OpenCV 3.1. I noticed with a deeper network the lag was worse so I'm not sure if it's entirely an OpenCV problem, but seems like it's worth looking into. – user3543300 Oct 06 '16 at 05:04
  • The strange interaction I think I had was with 2.4, so reverting may not solve things. It seems my 3.1 version (by default) used GTK and hence this code for [`waitKey()`](https://github.com/opencv/opencv/blob/3.1.0/modules/highgui/src/window_gtk.cpp#L1977). But googling a little before reading code may not be bad. ;) – Ulrich Stern Oct 06 '16 at 05:23
  • I found a pretty decent explanation of what might be happening here: http://answers.opencv.org/question/52774/waitkey1-timing-issues-causing-frame-rate-slow-down-fix/. Seems like `waitkey()` does a lot more than just a simple delay. Also `imshow` is meant for debugging purposes only so I might experiment moving away from that for the GUI. – user3543300 Oct 06 '16 at 20:44
  • I disagree with Steven Puttemans that highgui is "for debugging only." If you do not need a fancy GUI, I found it a good choice. The [docs](http://docs.opencv.org/2.4/modules/highgui/doc/highgui.html) have the official word. One of the projects where I use it is a real-time tracker for _Drosophila_ written in Python. The tracker handles 16 webcams at 320x240 pixels and 7.5 fps on one i7-4930K machine using only about 9% CPU, `imshow`-ing the current frame and a real-time heat map for each camera. – Ulrich Stern Oct 07 '16 at 06:31

4 Answers4

4

Some Explanations and Some Rethinks:

  1. I ran my code below on a laptop with an Intel Core i5-6300HQ @2.3GHz cpu, 8 GB RAM and NVIDIA GeForce GTX 960M gpu(2GB memory), and the result was:

    Whether I ran the code with caffe running or not(by commenting out or not net_output = this->net_->Forward(net_input) and some necessary stuff in void Consumer::entry()), I could always get around 30 fps in the main thread.

    The similar result was got on a PC with an Intel Core i5-4440 cpu, 8 GB RAM, NVIDIA GeForce GT 630 gpu(1GB memory).

  2. I ran the code of @user3543300 in the question on the same laptop, the result was:

    Whether caffe was running(on gpu) or not, I could also get around 30 fps.

  3. According to @user3543300 's feedback, with the 2 versions of code mentioned above, @user3543300 could get only around 15 fps, when running caffe(on a Nvidia GeForce 940MX GPU and Intel® Core™ i7-6500U CPU @ 2.50GHz × 4 laptop). And there will also be a slowdown of frame rate of the webcam when caffe running on gpu as an independent program.

So I still think that the problem may most possibly lie in hardware I/O limitaions such as DMA bandwidth(This thread about DMA may hint.) or RAM bandwidth. Hope @user3543300 can check this or find out the true problem that I haven't realized of.

If the problem is indeed what I think of above, then a sensible thought would be to reduce memory I/O overhead introduced by the CNN network. In fact, to solve the similar problem on embedded systems with limited hardware resources, there have been some research on this topic, e.g. Qautization Structurally Sparse Deep Neural Networks, SqueezeNet, Deep-Compression. So hopefully, it will also help to improve the frame rate of webcam in the question by applying such skills.


Original Answer:

Try this c++ solution. It uses threads for the I/O overhead in your task, I tested it using bvlc_alexnet.caffemodel, deploy.prototxt to do image classification and didn't see obvious slowing down of the main thread(webcam stream) when caffe running(on GPU):

#include <stdio.h>
#include <iostream>
#include <string>
#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>
#include "caffe/caffe.hpp"
#include "caffe/util/blocking_queue.hpp"
#include "caffe/data_transformer.hpp"
#include "opencv2/opencv.hpp"

using namespace cv;

//Queue pair for sharing image/results between webcam and caffe threads
template<typename T>
class QueuePair {
  public:
    explicit QueuePair(int size);
    ~QueuePair();

    caffe::BlockingQueue<T*> free_;
    caffe::BlockingQueue<T*> full_;

  DISABLE_COPY_AND_ASSIGN(QueuePair);
};
template<typename T>
QueuePair<T>::QueuePair(int size) {
  // Initialize the free queue
  for (int i = 0; i < size; ++i) {
    free_.push(new T);
  }
}
template<typename T>
QueuePair<T>::~QueuePair(){
  T *data;
  while (free_.try_pop(&data)){
    delete data;
  }
  while (full_.try_pop(&data)){
    delete data;
  }
}
template class QueuePair<Mat>;
template class QueuePair<std::string>;

//Do image classification(caffe predict) using a subthread
class Consumer{
  public:
    Consumer(boost::shared_ptr<QueuePair<Mat>> task
           , boost::shared_ptr<QueuePair<std::string>> result);
    ~Consumer();
    void Run();
    void Stop();
    void entry(boost::shared_ptr<QueuePair<Mat>> task
             , boost::shared_ptr<QueuePair<std::string>> result);

  private:
    bool must_stop();

    boost::shared_ptr<QueuePair<Mat> > task_q_;
    boost::shared_ptr<QueuePair<std::string> > result_q_;

    //caffe::Blob<float> *net_input_blob_;
    boost::shared_ptr<caffe::DataTransformer<float> > data_transformer_;
    boost::shared_ptr<caffe::Net<float> > net_;
    std::vector<std::string> synset_words_;
    boost::shared_ptr<boost::thread> thread_;
};
Consumer::Consumer(boost::shared_ptr<QueuePair<Mat>> task
                 , boost::shared_ptr<QueuePair<std::string>> result) :
 task_q_(task), result_q_(result), thread_(){

  //for data preprocess
  caffe::TransformationParameter trans_para;
  //set mean
  trans_para.set_mean_file("/path/to/imagenet_mean.binaryproto");
  //set crop size, here is cropping 227x227 from 256x256
  trans_para.set_crop_size(227);
  //instantiate a DataTransformer using trans_para for image preprocess
  data_transformer_.reset(new caffe::DataTransformer<float>(trans_para
                        , caffe::TEST));

  //initialize a caffe net
  net_.reset(new caffe::Net<float>(std::string("/path/to/deploy.prototxt")
           , caffe::TEST));
  //net parameter
  net_->CopyTrainedLayersFrom(std::string("/path/to/bvlc_alexnet.caffemodel"));

  std::fstream synset_word("path/to/caffe/data/ilsvrc12/synset_words.txt");
  std::string line;
  if (!synset_word.good()){
    std::cerr << "synset words open failed!" << std::endl;
  }
  while (std::getline(synset_word, line)){
    synset_words_.push_back(line.substr(line.find_first_of(' '), line.length()));
  }
  //a container for net input, holds data converted from cv::Mat
  //net_input_blob_ = new caffe::Blob<float>(1, 3, 227, 227);
}
Consumer::~Consumer(){
  Stop();
  //delete net_input_blob_;
}
void Consumer::entry(boost::shared_ptr<QueuePair<Mat>> task
    , boost::shared_ptr<QueuePair<std::string>> result){

  caffe::Caffe::set_mode(caffe::Caffe::GPU);
  caffe::Caffe::SetDevice(0);

  cv::Mat *frame;
  cv::Mat resized_image(256, 256, CV_8UC3);
  cv::Size re_size(resized_image.cols, resized_image.rows);

  //for caffe input and output
  const std::vector<caffe::Blob<float> *> net_input = this->net_->input_blobs();
  std::vector<caffe::Blob<float> *> net_output;

  //net_input.push_back(net_input_blob_);
  std::string *res;

  int pre_num = 1;
  while (!must_stop()){
    std::stringstream result_strm;
    frame = task->full_.pop();
    cv::resize(*frame, resized_image, re_size, 0, 0, CV_INTER_LINEAR);
    this->data_transformer_->Transform(resized_image, *net_input[0]);
    net_output = this->net_->Forward();
    task->free_.push(frame);

    res = result->free_.pop();
    //Process results here
    for (int i = 0; i < pre_num; ++i){
      result_strm << synset_words_[net_output[0]->cpu_data()[i]] << " " 
                  << net_output[0]->cpu_data()[i + pre_num] << "\n";
    }
    *res = result_strm.str();
    result->full_.push(res);
  }
}

void Consumer::Run(){
  if (!thread_){
    try{
      thread_.reset(new boost::thread(&Consumer::entry, this, task_q_, result_q_));
    }
    catch (std::exception& e) {
      std::cerr << "Thread exception: " << e.what() << std::endl;
    }
  }
  else
    std::cout << "Consumer thread may have been running!" << std::endl;
};
void Consumer::Stop(){
  if (thread_ && thread_->joinable()){
    thread_->interrupt();
    try {
      thread_->join();
    }
    catch (boost::thread_interrupted&) {
    }
    catch (std::exception& e) {
      std::cerr << "Thread exception: " << e.what() << std::endl;
    }
  }
}
bool Consumer::must_stop(){
  return thread_ && thread_->interruption_requested();
}


int main(void)
{
  int max_queue_size = 1000;
  boost::shared_ptr<QueuePair<Mat>> tasks(new QueuePair<Mat>(max_queue_size));
  boost::shared_ptr<QueuePair<std::string>> results(new QueuePair<std::string>(max_queue_size));

  char str[100], info_str[100] = " results: ";
  VideoCapture vc(0);
  if (!vc.isOpened())
    return -1;

  Consumer consumer(tasks, results);
  consumer.Run();

  Mat frame, *frame_copy;
  namedWindow("preview");
  double t, fps;

  while (true){
    t = (double)getTickCount();
    vc.read(frame);

    if (waitKey(1) >= 0){
      consuer.Stop();
      break;
    }

    if (tasks->free_.try_peek(&frame_copy)){
      frame_copy = tasks->free_.pop();
      *frame_copy = frame.clone();
      tasks->full_.push(frame_copy);
    }
    std::string *res;
    std::string frame_info("");
    if (results->full_.try_peek(&res)){
      res = results->full_.pop();
      frame_info = frame_info + info_str;
      frame_info = frame_info + *res;
      results->free_.push(res);
    }    

    t = ((double)getTickCount() - t) / getTickFrequency();
    fps = 1.0 / t;

    sprintf(str, " fps: %.2f", fps);
    frame_info = frame_info + str;

    putText(frame, frame_info, Point(5, 20)
         , FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
    imshow("preview", frame);
  }
}

And in src/caffe/util/blocking_queue.cpp, make a little change below and rebuild caffe:

...//Other stuff
template class BlockingQueue<Batch<float>*>;
template class BlockingQueue<Batch<double>*>;
template class BlockingQueue<Datum*>;
template class BlockingQueue<shared_ptr<DataReader::QueuePair> >;
template class BlockingQueue<P2PSync<float>*>;
template class BlockingQueue<P2PSync<double>*>;
//add these 2 lines below
template class BlockingQueue<cv::Mat*>;
template class BlockingQueue<std::string*>;
Dale
  • 1,608
  • 1
  • 9
  • 26
  • This looks interesting. I will try it out and report back. Just one question, how do I pass a `cv::Mat` as an input to a caffe network in C++. Also when I call the pretrained network, are there any parameters for `raw_scale` and `channel_swap` like there are in python? I've never used C++ caffe before. – user3543300 Oct 01 '16 at 02:53
  • @user3543300 The interface `DataTransformer::Transform(const cv::Mat& cv_img, Blob* transformed_blob)` in `data_transformer.cpp` will convert the `cv::Mat` to a `caffe::Blob` object which will be taken as input to a caffe network by calling `Net::Forward( const vector*> & bottom, Dtype* loss)`. `DataTransformer::Transform()` will automatically perform the `channel_swap` predure within it, but if to normalize image data from [0,255] to [0,1], you should explicitly set a scale using member function `set_scale(float value)` in `caffe::DataTransformer`. – Dale Oct 01 '16 at 07:15
  • I'm a bit confused, but in python I do this: `net = caffe.Classifier(net_model_file,net_pretrained, mean=mean, channel_swap=(2,1,0), raw_scale=255, image_dims=(256, 256))` Are you saying that's all done automatically? – user3543300 Oct 01 '16 at 07:35
  • 2
    I ran the code and my fps reduced to around 15 again. Not sure what is going on. I have a Nvidia GeForce 940MX GPU and Intel® Core™ i7-6500U CPU @ 2.50GHz × 4 – user3543300 Oct 02 '16 at 06:32
  • @user3543300 Is it GPU memory bandwidth that matters? – Dale Oct 02 '16 at 07:18
  • Not sure. I have 2 GB of GPU memory. Were you able to get 30 fps on the main thread? – user3543300 Oct 02 '16 at 07:57
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/124735/discussion-between-dale-song-and-user3543300). – Dale Oct 02 '16 at 08:01
2

It seems like caffe's python wrapper blocks the Global Interpreter Lock (GIL). Thus calling any caffe python command blocks ALL python threads.

A workaround (at your own risk) would be to disable the GIL for specific caffe functions. For instance, if you want to be able to run forward without lock, you can edit $CAFFE_ROOT/python/caffe/_caffe.cpp. Add this function:

void Net_Forward(Net<Dtype>& net, int start, int end) {
  Py_BEGIN_ALLOW_THREADS;   // <-- disable GIL
  net.ForwardFromTo(start, end);
  Py_END_ALLOW_THREADS;     // <-- restore GIL
}

And replace .def("_forward", &Net<Dtype>::ForwardFromTo) with:

.def("_forward", &Net_Forward)

Don't forget to make pycaffe after the change.

See this for more details.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • Does the GIL apply to multiprocessing. Because I am using multiprocessing as opposed of multithreading in this example program. – user3543300 May 19 '17 at 05:43
  • @user3543300 I honestly don't know. I work with multi**threading** and not multiprocessing. I observed similar behavior with multiprocessing as well, but have not checked this solution under multiprocessing conditions. – Shai May 19 '17 at 07:39
0

One think might happen in your code, that is it works in gpu mode for the first call and on later calls it calculates the classification under cpu mode as it the default mode. On older version of caffe set gpu mode for once was enough, now newer version it needs to set mode everytime. You can try with following change:

def run(self):

        #Load caffe net -- code omitted 
        while True:
            caffe.set_mode_gpu()
            caffe.set_device(0)
            image = self.task_queue.get()
            #crop image -- code omitted
            text = net.predict(image)
            self.result_queue.put(text)

        return

Also please have a look at the gpu timings while the consumer thread is running. You can use following command for nvidia:

nvidia-smi

Above command will show you the gpu utilization at runtime.

If it not solves another solution is, make the opencv frame extraction code under a thread. As it is related with I/O and device access you might get benefit running it on separate thread from GUI thread/main thread. That thread will push frames in an queue and current consumer thread will predict. In that case carefully handle the queue with critical block.

MD. Nazmul Kibria
  • 1,080
  • 10
  • 21
  • I tried both of your suggestions but didn't see an improvement. I used nvidia x server settings (on ubuntu) to see the gpu utilization after calling `set_mode_gpu` explicitly each time and saw gpu utilization jump to 99%. However I made my frame extraction one process and GUI display another process as you suggested (none of which were the main program), and didn't see any performance increase. In fact I think my cpu usage may have jumped slightly. – user3543300 Sep 23 '16 at 22:02
  • how much time it takes to classify a single frame in gpu? – MD. Nazmul Kibria Sep 24 '16 at 04:16
  • About .15 seconds – user3543300 Sep 24 '16 at 05:47
  • Each prediction takes .15 sec, so you can not process more than 6 frames per second. Though you use threads to predict, it will have a continuous lag if you approach to process 30 frames per second. I am not sure if you are using cudnn. If not you can use it. It accelerates speed than only GPU mode. – MD. Nazmul Kibria Sep 24 '16 at 06:27
  • another approach can make it faster, you can process in batch. Say you start displaying video after an intentional .5 seconds delay. And you can split 3 batch operation in a second, where each batch you can process 10 frames. Which may take bit more time than single frame but surely it will be faster than single*n times. If you starts a delayed display after .5 sec, if one batch takes 300 ms to process, you will have 10 frames processed while you started showing frames... – MD. Nazmul Kibria Sep 24 '16 at 06:35
  • Batch means, batch process under caffe .... predicting multiple frames in caffe in one batch at the same time – MD. Nazmul Kibria Sep 24 '16 at 06:36
  • I'm using CUDA 8.0 and Cudnn 5.1.5. I am fine with a continuous lag as the content of my webcam stream isn't changing super fast. However the problem is my webcam stream lags even when it has nothing to do with the separate process where the prediction is happening. – user3543300 Sep 24 '16 at 08:38
  • I can try a batch prediction. How do you do that in caffe? – user3543300 Sep 24 '16 at 08:39
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/124092/discussion-between-md-nazmul-kibria-and-user3543300). – MD. Nazmul Kibria Sep 24 '16 at 08:46
  • I've added another bounty so I can't continue the discussion in chat. But I'm not sure why adding to `cv2.waitKey()` would help, as my webcam runs at 30 fps and the code also runs at 30fps if I comment out the caffe prediction – user3543300 Sep 28 '16 at 04:34
  • Try multi threading instead of multiprocessing, it could be costly in multiprocessing to exchange so many data of frames. see this [link](http://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python) some comparison is there – MD. Nazmul Kibria Sep 28 '16 at 05:39
0

Try multi threading approach instead of multiprocessing. Spawning processes is slower than spawning into threads. Once they are running, there is not much difference. In your case I think threading approach will benefit as there are so many frames data involved.

MD. Nazmul Kibria
  • 1,080
  • 10
  • 21
  • 2
    Python has a GIL lock, where only 1 thread can run at a time, so I'm not sure if this is the best for parallelism and speed. – user3543300 Sep 28 '16 at 06:15
  • http://stackoverflow.com/questions/32899077/is-it-possible-to-read-webcam-frames-in-parallel – MD. Nazmul Kibria Sep 28 '16 at 06:48
  • I'm only spawning each process once, so I'm not sure if that will make a difference. I'd tired a multithreading approach in the past but it actually slowed everything down. Here's a good explanation: https://wiki.python.org/moin/GlobalInterpreterLock. – user3543300 Sep 28 '16 at 23:41