0

I am having a random dataset comprises of 100000 images.

I have used the following code on the same dataset but the processing speed is terribly slow (in AWS GPU instance).

import cv2
from progressbar import ProgressBar
pbar = ProgressBar()
def image_to_feature_vector(image, size=(128, 128)):
    return cv2.resize(image, size).flatten()
imagePath = #path to dataset
data = []
#load images
for i in pbar(range(0,len(imagePath))):
   image = cv2.imread(imagePath[i])
   image=cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
   features = image_to_feature_vector(image)
   data.append(features)

How to improve processing speed?

snfs
  • 11
  • 1
  • 3
  • How long does it take? – Elis Byberi Nov 29 '17 at 14:56
  • @ElisByberi 1 hour – snfs Nov 29 '17 at 14:57
  • ~ 28 images/s. Have you tried it without `ProgressBar()`? Maybe it will be faster! GPU will not speedup `ProgressBar()`! – Elis Byberi Nov 29 '17 at 15:00
  • I dont think in the current implementation the GPU advantage is actually used. AFAIK you need to transform images into `UMAT` for `cvtColor` to use GPU. See for example: https://stackoverflow.com/questions/27445398/how-to-read-umat-from-a-file-in-opencv-3-0-beta – der_die_das_jojo Nov 29 '17 at 15:03
  • 3
    generally for optimization purposes you should first analyze what is your bottleneck and then try to optimize the slowest part. Use a [profiler](https://docs.python.org/2/library/profile.html) to find out which part consumes post processing time – der_die_das_jojo Nov 29 '17 at 15:08
  • For the same cost as a GPU instance you can probably run the same script on multiple smaller general instances, each instance processing a subset of the images. – Tom Dalton Nov 29 '17 at 15:12
  • 1
    How many times do you actually need to run this on the same input? My guess would be just once, assuming you save the results. If that's the case, optimizing this might not be very valuable. At most you might just wanna run it in parallel on smaller chunks, and at the end merge the results. – Dan Mašek Nov 29 '17 at 17:17

1 Answers1

1

The real solution depends on the bottleneck analysis.

Anyway, the image reading (loading) time is a valuable resource that you could use.

Your process is sequential:

enter image description here

In scenarios like that I use something called IO pipeline or parallel pipeline. The idea is to use one thread to load serially the images and serve them for multiple processing threads. Thus, while you Input-thread is reading, one or more threads are using the CPUs to processing previous images. Use a single thread to write out the data serially as well:

enter image description here

Unfortunately I don't use python that much to write something as example. This pattern would be already implemented in an python thread framework.

I use this approach for grab camera frames and processing them in high speed, but I use C++ for it. if you don't matter to programming in C++, you would find something inspiring in this impressive answer.

Duloren
  • 2,395
  • 1
  • 25
  • 36