Attribute error while integrating Google TTS with YOLOv8

Question

My project aims to detect object labels and coordinates and then convert them into a string which is converted into voice using gTTS but I keep getting an attribute error in the prediction labels. I am new to this framework, any help will be appreciated.

Code:

import cv2
from gtts import gTTS
import os
from ultralytics import YOLO

def convert_labels_to_text(labels):
    text = ", ".join(labels)
    return text

class YOLOWithLabels(YOLO):
    def __call__(self, frame):
        results = super().__call__(frame)
        labels = results.pred[0].get_field("labels").tolist()
        annotated_frame = results.render()
        return annotated_frame, labels

cap = cv2.VideoCapture(0)
model = YOLOWithLabels('yolov8n.pt')

while cap.isOpened():
    success, frame = cap.read()

    if success:
        annotated_frame, labels = model(frame)

        message = convert_labels_to_text(labels)

        tts_engine = gTTS(text=message)  # Initialize gTTS with the message

        tts_engine.save("output.mp3")
        os.system("output.mp3")

        cv2.putText(annotated_frame, message, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        cv2.imshow("YOLOv8 Inference", annotated_frame)

        if cv2.waitKey(1) & 0xFF == ord("q"):
            break

    else:
        break

cap.release()
cv2.destroyAllWindows()

Error

File "C:\Users\alien\Desktop\YOLOv8 project files\gtts service\testservice.py", line 13, in __call__
    labels = results.pred[0].get_field("labels").tolist()
             ^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'pred'

print(results)

orig_shape: (480, 640)
path: 'image0.jpg'
probs: None
save_dir: None
speed: {'preprocess': 3.1604766845703125, 'inference': 307.905912399292, 'postprocess': 2.8924942016601562}]
0: 480x640 1 person, 272.4ms
Speed: 3.0ms preprocess, 272.4ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 640)
[ultralytics.yolo.engine.results.Results object with attributes:
    boxes: ultralytics.yolo.engine.results.Boxes object
    keypoints: None
    keys: ['boxes']
    masks: None
    names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
    orig_img: array([[[168, 167, 166],
            [165, 165, 165],
            [165, 166, 167],
            ...,
            [183, 186, 178],
            [183, 186, 178],
            [184, 187, 179]],

           [[168, 167, 165],
            [166, 165, 165],
            [166, 167, 166],
            ...,
            [184, 187, 179],
            [183, 186, 178],
            [184, 187, 179]],

           [[168, 167, 164],
            [167, 167, 164],
            [167, 167, 165],
            ...,
            [184, 187, 178],
            [184, 187, 179],
            [183, 186, 178]],

           ...,

           [[196, 192, 185],
            [196, 192, 185],
            [196, 192, 185],
            ...,
            [ 25,  29,  38],
            [ 22,  25,  35],
            [ 20,  24,  34]],

           [[199, 195, 187],
            [197, 193, 186],
            [197, 193, 186],
            ...,
            [ 23,  26,  35],
            [ 22,  25,  35],
            [ 22,  25,  35]],

           [[199, 195, 187],
            [199, 195, 187],
            [199, 195, 187],
            ...,
            [ 20,  24,  33],
            [ 19,  23,  33],
            [ 19,  23,  33]]], dtype=uint8)

Could you try: `labels = results[0].get_field("labels").tolist()` — doneforaiur, Jun 21 '23 at 17:38
Thank you for your suggestion kind sir, but it seems to throw off another attribute based error: AttributeError: 'Results' object has no attribute 'get_field'. — asfriendlyascarbon, Jun 21 '23 at 17:41
I'm not familiar with either YOLO nor Google TTS, but; https://github.com/orgs/ultralytics/discussions/2547#discussioncomment-5889458 Could you try this example? I'm not sure how declaring a ``YOLOWithLabels` class would help. Also, couldn't find anything about `labels` here; https://docs.ultralytics.com/reference/yolo/engine/results/#ultralytics.yolo.engine.results.Probs.top1conf But will keep looking. — doneforaiur, Jun 21 '23 at 17:59
Ooo, found something interesting. `YOLO('yolo8n-cls.pt')` gives a model that only classifies what is visible in the image. https://docs.ultralytics.com/tasks/classify/ . In the meantime, can you just `print(results)` and post the output? I'm really confused as their documentation is really weird. — doneforaiur, Jun 21 '23 at 18:13
Thank you so much! I will go through the documentation even though it feels difficult to understand. I have attached print(results) output to the post) — asfriendlyascarbon, Jun 22 '23 at 05:22
Okay, we can clearly see it has detected `1 person`. You are very close to the solution, sadly I have to go but I'm sure you'll get an answer in no time! ^^ — doneforaiur, Jun 22 '23 at 05:43
Yes, but I'm quite unsure of how to extract that from all the other data and the arrays which I don't need. I get that you have to go :( Thanks for the help so far. — asfriendlyascarbon, Jun 22 '23 at 05:47
Try; `for result in results: print(result)`. That might also give you a clue. — doneforaiur, Jun 22 '23 at 05:56

score 0 · Accepted Answer · answered Jun 26 '23 at 04:24

0

May God have mercy on whoever wrote Ultralytics's docs... Here's how you can print only the labels:

from ultralytics import YOLO

model = YOLO('yolov8n.pt')
results = model('http://images.cocodataset.org/val2017/000000397133.jpg')

print(model.names)

for result in results:
    boxes = result.boxes.cpu().numpy()
    for box in boxes:
        print(model.names[box.cls[0]])

model.names contains all the classes that can be predicted. Each box has a cls (class for short) attribute which is a list of int values. You can search for that class in the model.names dictionary.

PS: Each box has a list of ints which indicates that the model can return multiple classes for one bounding box. This example only takes first of the classes in the box.cls list.

answered Jun 26 '23 at 04:24

doneforaiur

1,308
7
14
21

Thank you so much kind sir for taking out the time,I will be forever grateful to you, I hope I am able to help someone too. And you have perfectly encapsulated how I feel about the docs. – asfriendlyascarbon Jun 26 '23 at 14:15
I'm glad it worked out! Have a great day. :^) – doneforaiur Jun 26 '23 at 14:45
my entire project has come together, really grateful to you :)) – asfriendlyascarbon Jul 06 '23 at 13:18
Hey! I'm really happy to hear that! This news made my day. :^) – doneforaiur Jul 06 '23 at 14:44

Attribute error while integrating Google TTS with YOLOv8

1 Answers1