1

I wrote a code to remove the background of 8000 images but that whole code is taking approximately 8 hours to give the result.

  • How to improve its time complexity as I have to work on a large dataset in future?
  • Or do I have to write a whole new code? If it is, please suggest some sample codes.
from rembg import remove
import cv2
import glob
for img in glob.glob('../images/*.jpg'):
   a = img.split('../images/')
   a1 = a[1].split('.jpg')
   try: 
     cv_img = cv2.imread(img)
     output = remove(cv_img)
   except:
     continue
   cv2.imwrite('../output image/' + str(a1[0]) + '.png', output) 
Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
Hyphen
  • 21
  • 6
  • This may be more appropriate for https://codereview.stackexchange.com – Chris Sep 13 '22 at 05:42
  • 2
    See **Method 8** here for a possible approach https://stackoverflow.com/a/51822265/2836621 – Mark Setchell Sep 13 '22 at 06:57
  • if you want another method for background subtraction in videos check the [BGSLibrary](https://github.com/andrewssobral/bgslibrary); there is a benchmark section on the github page – t2solve Sep 13 '22 at 14:58

4 Answers4

1

One simple approach would be to divide the work into multiple threads. See ThreadPoolExecutor for more.

You can play around with max_workers= to see what get's the best results. Note that max-workers can be any number between 1 and 32.

This sample code is ready to run. It assumes the image files are in the same directory as your main.py and the output_image directory exits.

import cv2
import rembg
import sys
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor


out_dir = Path("output_image")
in_dir = Path(".")


def is_image(absolute_path: Path):
    return absolute_path.is_file and str(absolute_path).endswith('.png')


input_filenames = [p for p in filter(is_image, Path(in_dir).iterdir())]


def process_image(in_dir):
    try:
        image = cv2.imread(str(in_dir))
        if image is None or not image.data:
            raise cv2.error("read failed")
        output = rembg.remove(image)
        in_dir = out_dir / in_dir.with_suffix(".png").name
        cv2.imwrite(str(in_dir), output)
    except Exception as e:
        print(f"{in_dir}: {e}", file=sys.stderr)


executor = ThreadPoolExecutor(max_workers=4)


for result in executor.map(process_image, input_filenames):
    print(f"Processing image: {result}")
Cyrill
  • 113
  • 3
  • 9
0

Check out the U^2Net repository. Like u2net_test.py, Writing your own remove function and using dataloaders can speed up the process. if it is not necessary skip the alpha matting else you can add the alpha matting code from rembg.

def main():

# --------- 1. get image path and name ---------
model_name='u2net'#u2netp



image_dir = os.path.join(os.getcwd(), 'test_data', 'test_images')
prediction_dir = os.path.join(os.getcwd(), 'test_data', model_name + '_results' + os.sep)
model_dir = os.path.join(os.getcwd(), 'saved_models', model_name, model_name + '.pth')

img_name_list = glob.glob(image_dir + os.sep + '*')
print(img_name_list)

#1. dataloader
test_salobj_dataset = SalObjDataset(img_name_list = img_name_list,
                                    lbl_name_list = [],
                                    transform=transforms.Compose([RescaleT(320),
                                                                  ToTensorLab(flag=0)])
                                    )
test_salobj_dataloader = DataLoader(test_salobj_dataset,
                                    batch_size=1,
                                    shuffle=False,
                                    num_workers=1)
for i_test, data_test in enumerate(test_salobj_dataloader):

    print("inferencing:",img_name_list[i_test].split(os.sep)[-1])

    inputs_test = data_test['image']
    inputs_test = inputs_test.type(torch.FloatTensor)

    if torch.cuda.is_available():
        inputs_test = Variable(inputs_test.cuda())
    else:
        inputs_test = Variable(inputs_test)

    d1,d2,d3,d4,d5,d6,d7= net(inputs_test)

    # normalization
    pred = d1[:,0,:,:]
    pred = normPRED(pred)

    # save results to test_results folder
    if not os.path.exists(prediction_dir):
        os.makedirs(prediction_dir, exist_ok=True)
    save_output(img_name_list[i_test],pred,prediction_dir)

    del d1,d2,d3,d4,d5,d6,d7
Tsuki
  • 36
  • 5
0

Try to use parallelization with multiprocessing like Mark Setchell mentioned in his comment. I rewrote your code according to Method 8 from here. Multiprocessing should speed up your execution time. I did not test the code, try if it works.

import glob
from multiprocessing import Pool

import cv2
from rembg import remove


def remove_background(filename):
    a = filename.split("../images/")
    a1 = a[1].split(".jpg")
    try:
        cv_img = cv2.imread(filename)
        output = remove(cv_img)
    except:
        continue
    cv2.imwrite("../output image/" + str(a1[0]) + ".png", output)


files = glob.glob("../images/*.jpg")
pool = Pool(8)
results = pool.map(remove_background, files)
Pi.Lilac
  • 135
  • 1
  • 7
-1

Ah, you used the example from https://github.com/danielgatis/rembg#usage-as-a-library as template for your code. Maybe try the other example with PIL image instead of OpenCV. The latter is mostly less fast, but who knows. Try it with maybe 10 images and compare execution time.

Here is your code using PIL instead of OpenCV. Not tested.

import glob

from PIL import Image
from rembg import remove

for img in glob.glob("../images/*.jpg"):

    a = img.split("../images/")
    a1 = a[1].split(".jpg")
    try:
        cv_img = Image.open(img)
        output = remove(cv_img)
    except:
        continue
    output.save("../output image/" + str(a1[0]) + ".png")
Pi.Lilac
  • 135
  • 1
  • 7