-3

iam trying to automise the training process for yolov5 instead of manually annotating the images. So, i need images and a text file which contain the coordinates of region of interest(ROI) i have a subscription of azureparser, in which when i pass the pdf, i get the coordinates of ROI. i want to utilize these coordinates in making that yolo .txt file and for images iam converting these pdf into images(coz i dont have the input as image, only pdf).

so i have 2 queations:-

  1. how to convert azure coordinates into YOLO format ?
  2. when i convert PDF to Image(as YOLO accepts only Image input), the coordinates of ROI changes and i cant use the Azure Parser coordinates on the converted image, so How can i convert PDF to image such that after converting, the coordinates of each ROI remain the same as it were in PDF ? ###################################################################################################

(I tried libraries such as pdf2image etc to convert it to pdf, but then also there is no match in ROI coordinates. and i didnt get the correct conversion formula for converting Azure Parser coordinates into YOLOv5)

  • For the first question, please have a look at https://stackoverflow.com/questions/56115874/how-to-convert-bounding-box-x1-y1-x2-y2-to-yolo-style-x-y-w-h – dinhanhx Jul 17 '23 at 07:14

1 Answers1

0

I don't know anything about Azure Parser but I would suggest looking first at what the coordinates looks like: [x,y,x,y,..] or [(x,y), (x,y),..] etc. Are the coordinates relative (between 0 and 1) or in plain coordinates. If the second, are the numbers bigger than the image size ?

Then I suggest trying to draw these coordinates on the image:

def draw_rectangle_around_bounding_box(pil_img, height, width, bounding_box):
    ''' Takes a ImageDraw image and a bounding box (as percentage) as parameter and returns the same image array 
    with a red polygon on the bounding box coordinates
    
    Segmentation_points array like [x,y,w,h]
    x,y,w,h ARE NORMALIZED i.e. between 0 and 1
    x and y are the center of the bounding box
    '''

    x,y,w,h = [elem for elem in bounding_box]

    # transform [x,y,w,h] to [(x,y), (x+w,y+h)]
    upper_left_point = ((x-(w/2))*width, (y-(h/2))*height)
    lower_right_point = ((x+(w/2))*width, (y+(h/2))*height)
    
    tuple_boundingboxes = [upper_left_point, lower_right_point]

    pil_img.rectangle(tuple_boundingboxes,outline="red",width=5)



def draw_boundingboxes(img_arr : list, bounding_boxes: list, display : bool = True):
    '''
    Adds a red rectangle around the bounding boxes area and display the newly created image
    
    
    INPUT:
        - img_arr : BGR image list
        - bounding_boxes : format => [x,y,w,h] for one object or [[x,y,w,h], [x,y,w,h],...] for multiple object
                            x,y,w,h ARE NORMALIZED i.e. between 0 and 1
        - display : Boolean, true if we display the image with matplotlib

    OUTPUT:
        - RGB image with the rectangle drawn
    '''
    im_rgb = cv2.cvtColor(img_arr, cv2.COLOR_BGRA2RGBA) # OpenCV use BGR while matplotlib use RGB so we do the conversion before plotting

    # if segmentation is not multidimensionnal = one object only
    if not isinstance(bounding_boxes[0], list):
        bounding_boxes = [bounding_boxes]
    
    img = Image.fromarray(im_rgb)

    height,width, _ = img_arr.shape
    
    for bounding_box in bounding_boxes:
        # convert image array to ImageDraw and draw a polygon on the segmentation points
        pil_img = ImageDraw.Draw(img)
        draw_rectangle_around_bounding_box(pil_img, height, width, bounding_box)


    np_img = np.array(img)

    if display:
        plt.imshow(np_img)
        plt.axis('off')
        plt.show()
    return np_img

Then depending on the result it will be easier to debug what is the problem coming from

Timothee W
  • 149
  • 7