1

I have a program that captures live video feed from a CSI camera connected to a jetson nano and estimates the pose of an aruco marker. However, when I move the aruco marker in real life, the position returned by my code only changes about 0.5s-1s later. I am very new to all of this and I can't figure out where the bottleneck is.

My code is below. I measured the average time it takes for the main loop where the frames are analysed to complete and got around 50-60 ms, so I dont think the detection and pose estimation are the problems themselves.

I commented cv::imshow("Image", image); (to see if the delay was coming from that) but the change in the delay is minimal to none. Since a window is no longer shown, cv::waitKey(1) == 'q' will no longer trigger, so that is why I added total_time >= 30 as a different condition. From what I understand, whatever time I set in cv::waitKey will make it so that the program waits that set amount of time before continuing. Taking this into accout, I changed the line to cv::waitKey(1000) == 'q', so that the program has to wait a second before analysing the next frame(effectively setting the fps to 1) to see what the delay would be. Well, it takes around 12 seconds for any movement in the real world for any changes to be seen in the output(I uncommented cv::imshow("Image", image); to see what frames were being analysed).

I thought that when the time in cv::waitKey is over, the current frame would be captured, but it seems like it is either capturing an old frame for some reason or de videocapture itself has a massive delay.

Can anyone help me figure out what the problem is and any possible ways to fix it? Thanks in advance.

#include <opencv2/opencv.hpp>
#include <opencv2/aruco.hpp>
#include <opencv2/objdetect/aruco_detector.hpp>
#include <opencv2/objdetect/aruco_dictionary.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc.hpp>

std::string gstreamer_pipeline(int capture_width, int capture_height, int display_width, int display_height, int framerate, int flip_method, int sensor_mode)
{
    return "nvarguscamerasrc sensor_mode=" + std::to_string(sensor_mode) + " ! video/x-raw(memory:NVMM), width=(int)" + std::to_string(capture_width) + ", height=(int)" +
           std::to_string(capture_height) + ", framerate=(fraction)" + std::to_string(framerate) +
           "/1 ! nvvidconv flip-method=" + std::to_string(flip_method) + " ! video/x-raw, width=(int)" + std::to_string(display_width) + ", height=(int)" +
           std::to_string(display_height) + ", format=(string)GRAY8 ! videoconvert ! video/x-raw, format=(string)GRAY8 ! appsink";
}

bool loadCameraCalibration(const std::string &filename, cv::Mat &cameraMatrix, cv::Mat &distCoeffs)
{
    cv::FileStorage fs("calibration_params.xml", cv::FileStorage::READ);
    if (!fs.isOpened())
    {
        return false;
    }

    fs["Camera_Matrix"] >> cameraMatrix;
    fs["Distorsion_Coefficients"] >> distCoeffs;
    fs.release();

    return true;
}

void detectCorners(const cv::Mat &image, const cv::aruco::ArucoDetector &detector, std::vector<int> &ids, std::vector<std::vector<cv::Point2f>> &corners)
{
    // Image is already grayscale.

    // Detect the markers in the grayscale image.
    std::vector<std::vector<cv::Point2f>> rejectedCorners;
    detector.detectMarkers(image, corners, ids);

    // Refine the corners using cornerSubPix.
    for (auto &corner : corners)
    {
        cv::cornerSubPix(image, corner, cv::Size(3, 3), cv::Size(-1, -1), cv::TermCriteria(cv::TermCriteria::EPS + cv::TermCriteria::COUNT, 30, 0.01));
    }
}

int main()
{
    // Set camera parameters
    int capture_width = 1640;
    int capture_height = 1232;
    int display_width = 1640;
    int display_height = 1232;
    int framerate = 30;
    int flip_method = 2;
    int sensor_mode = 3;

    std::string pipeline = gstreamer_pipeline(capture_width, capture_height, display_width, display_height, framerate, flip_method, sensor_mode);
    std::cout << "Using pipeline: \n\t" << pipeline << "\n";

    // Load the camera matrix and distortion coefficients from the XML file.
    cv::Mat cameraMatrix, distCoeffs;
    if (!loadCameraCalibration("calibration_params.xml", cameraMatrix, distCoeffs))
    {
        std::cout << "Failed to load camera calibration parameters." << std::endl;
        return -1;
    }

    // Set the marker length.
    float markerLength = 0.10;

    // Create a 4x1 coordinate system for the marker.
    cv::Mat objPoints(4, 1, CV_32FC3);
    objPoints.ptr<cv::Vec3f>(0)[0] = cv::Vec3f(-markerLength / 2.f, markerLength / 2.f, 0);
    objPoints.ptr<cv::Vec3f>(0)[1] = cv::Vec3f(markerLength / 2.f, markerLength / 2.f, 0);
    objPoints.ptr<cv::Vec3f>(0)[2] = cv::Vec3f(markerLength / 2.f, -markerLength / 2.f, 0);
    objPoints.ptr<cv::Vec3f>(0)[3] = cv::Vec3f(-markerLength / 2.f, -markerLength / 2.f, 0);

    // Create a detector and dictionary.
    cv::aruco::DetectorParameters detectorParams = cv::aruco::DetectorParameters();
    cv::aruco::Dictionary dictionary = cv::aruco::getPredefinedDictionary(cv::aruco::DICT_4X4_50);
    cv::aruco::ArucoDetector detector(dictionary, detectorParams);

    // Create a VideoCapture object to grab frames from the camera.
    cv::VideoCapture cap(pipeline, cv::CAP_GSTREAMER);
    if (!cap.isOpened())
    {
        std::cout << "Failed to open camera." << std::endl;
        return -1;
    }

    double total_time = 0;
    int count_loop = 0;

    // Main loop to grab frames and detect markers.
    while (true)
    {
        count_loop += 1;
        // Start timer
        double start_time = static_cast<double>(cv::getTickCount());

        cv::Mat image;
        if (!cap.read(image))
        {
            std::cout << "Capture read error" << std::endl;
            break;
        }

        // Detect the markers in the image.
        std::vector<int> ids;
        std::vector<std::vector<cv::Point2f>> corners;
        detectCorners(image, detector, ids, corners);

        // If at least one marker was detected, draw it and its pose.
        if (ids.size() > 0)
        {
            cv::aruco::drawDetectedMarkers(image, corners, ids);

            // Initialize rvecs and tvecs.
            std::vector<cv::Vec3d> rvecs(ids.size());
            std::vector<cv::Vec3d> tvecs(ids.size());

            // Calculate the pose of each marker.
            for (int i = 0; i < ids.size(); i++)
            {
                cv::solvePnP(objPoints, corners[i], cameraMatrix, distCoeffs, rvecs[i], tvecs[i]);
                // One thing to note is that solvePnP returns the world coordinates relative to the camera
            }
            for (int j = 0; j < ids.size(); j++)
            {
                // Check if the current marker has ID 13
                if (ids[j] == 13)
                {
                    // Extract x, y, z from the translation vector (tvec).
                    double x = tvecs[j][0];
                    double y = tvecs[j][1];
                    double z = tvecs[j][2];

                    // Calculate yaw, pitch, and roll from the rotation vector (rvec).
                    cv::Mat rotMat;
                    cv::Rodrigues(rvecs[j], rotMat);

                    double yaw, pitch, roll;

                    // https://stackoverflow.com/questions/11514063/extract-yaw-pitch-and-roll-from-a-rotationmatrix

                    /*
                    Since Opencv uses the pinhole camera model(z-looking out of the camera,
                    x-to the right, y- down),yaw pitch and roll arent the same. Here, yaw will be the
                    rotation around the y axis, pitch the rotation around the x axis and roll the
                    rotation around the z axis

                    Normally, we would calculate like this:

                    // Calculate yaw (around z-axis).
                    yaw = atan2(rotMat.at<double>(1, 0), rotMat.at<double>(0, 0));

                    // Calculate pitch (around y-axis).
                    pitch = atan2(-rotMat.at<double>(2, 0), std::sqrt(rotMat.at<double>(2, 1) * rotMat.at<double>(2, 1) + rotMat.at<double>(2, 2) * rotMat.at<double>(2, 2)));

                    // Calculate roll (around x-axis).
                    roll = atan2(-rotMat.at<double>(2, 1), rotMat.at<double>(2, 2));

                    But we must change some of these to be coherent with the pinhole model:
                    roll becomes pitch, pitch becomes yaw and yaw becomes roll
                    */

                    // Using pinhole camera model:
                    // Calculate roll (around z-axis).
                    roll = atan2(rotMat.at<double>(1, 0), rotMat.at<double>(0, 0));

                    // Calculate yaw (around y-axis).
                    yaw = atan2(-rotMat.at<double>(2, 0), std::sqrt(rotMat.at<double>(2, 1) * rotMat.at<double>(2, 1) + rotMat.at<double>(2, 2) * rotMat.at<double>(2, 2)));

                    // Calculate picth (around x-axis).
                    pitch = atan2(-rotMat.at<double>(2, 1), rotMat.at<double>(2, 2));

                    // Convert yaw, pitch, and roll to degrees.
                    double yaw_degrees = yaw * (180.0 / CV_PI);
                    double pitch_degrees = pitch * (180.0 / CV_PI);
                    double roll_degrees = roll * (180.0 / CV_PI);

                    //Print the measurements             
                    std::cout << "x: " << x << std::endl;
                    std::cout << "y: " << y << std::endl;
                    std::cout << "z: " << z << std::endl;
                    std::cout << "yaw: " << yaw_degrees << std::endl;
                    std::cout << "roll: " << roll_degrees << std::endl;
                    std::cout << "pitch: " << pitch_degrees << std::endl;

                   
                }
            }
            
           
            // Draw the pose of each marker.
            // Since I am no longer showing the image, this part of the is no longer needed so I commented it
            /*
            for (int i = 0; i < ids.size(); i++)
            {
                cv::drawFrameAxes(image, cameraMatrix, distCoeffs, rvecs[i], tvecs[i], 0.1);
            }
            */
        }

        // Show the image with detected markers.
        // cv::imshow("Image", image);

        // Calculate elapsed time in seconds
        double elapsed_time = (static_cast<double>(cv::getTickCount()) - start_time) / cv::getTickFrequency();
        std::cout << "Elapsed Time: " << elapsed_time << " seconds" << std::endl;
        total_time += elapsed_time;

        // Break the loop if 'q' key is pressed.
        if ((cv::waitKey(1) == 'q') || (total_time >= 30))
        {
            break;
        }
    }

    //Calculate average time for each iteration of the loop
    std::cout << "Average time= " << total_time / count_loop << std::endl;

    cap.release();
    cv::destroyAllWindows();
    return 0;
}

EDIT: The problems doesnt seem to come from the aruco logic itself. Without the aruco logic, the delay is ever so slightly smaller (For example, when I set the time in cv::waitKey() to 1ms, the delay is ~0.5s, maybe a bit less. I can tell the problem is still present without the aruco logic since when I change the time in cv::waitKey() to 1000ms, the same ~12 second it there (In this example, if the removal of aruco logic made a difference, I cant notice it). Here is the updated code:

#include <opencv2/opencv.hpp>


std::string gstreamer_pipeline(int capture_width, int capture_height, int display_width, int display_height, int framerate, int flip_method, int sensor_mode)
{
    return "nvarguscamerasrc sensor_mode=" + std::to_string(sensor_mode) + " ! video/x-raw(memory:NVMM), width=(int)" + std::to_string(capture_width) + ", height=(int)" +
           std::to_string(capture_height) + ", framerate=(fraction)" + std::to_string(framerate) +
           "/1 ! nvvidconv flip-method=" + std::to_string(flip_method) + " ! video/x-raw, width=(int)" + std::to_string(display_width) + ", height=(int)" +
           std::to_string(display_height) + ", format=(string)GRAY8 ! videoconvert ! video/x-raw, format=(string)GRAY8 ! appsink";
}

int main()
{
    // Set camera parameters
    int capture_width = 1640;
    int capture_height = 1232;
    int framerate = 30;
    int flip_method = 2;
    int sensor_mode = 3;

    std::string pipeline = gstreamer_pipeline(capture_width, capture_height, capture_width, capture_height, framerate, flip_method, sensor_mode);
    std::cout << "Using pipeline: \n\t" << pipeline << "\n";

    // Create a VideoCapture object to grab frames from the camera.
    cv::VideoCapture cap(pipeline, cv::CAP_GSTREAMER);
    if (!cap.isOpened())
    {
        std::cout << "Failed to open camera." << std::endl;
        return -1;
    }


    cv::namedWindow("Image", cv::WINDOW_NORMAL);  // Create a window for displaying the image

    // Main loop to grab frames, display them, and check for user input.
    while (true)
    {
        cv::Mat image;
        if (!cap.read(image))
        {
            std::cout << "Capture read error" << std::endl;
            break;
        }

        // Display the captured image
        cv::imshow("Image", image);

        // Check if "q" key is pressed using cv::waitKey
        if (cv::waitKey(1) == 'q')
        {
            break;
        }
    }


    cap.release();
    cv::destroyAllWindows();
    return 0;
}

EDIT 2:

If I set the time in cv::waitKey to 2000ms, it takes the same 12 frames for the image to show. Maybe there is some buffer somewhere that causes this delay? I tried using cap.set(cv::CAP_PROP_BUFFERSIZE, 1); below cv::VideoCapture cap(pipeline, cv::CAP_GSTREAMER); but that didn't change anything.

  • 2
    Did you check if it's not a problem with the gstreamer pipeline itself? What if you remove all aruco logic and simply try to display frames every second (e.g. with 1 second sleep in the loop)? If you experience 12 second delay, then this sounds like slow consumer problem. – pptaszni Aug 09 '23 at 14:05
  • @pptaszni When I delete all aruco logic, I still experience the ~12 seconds delay. I dont really know what the consumer is or how it works, could you give me a direction as in where to look in order to find a solution? Thanks for the help. – Guilherme Pinto Aug 09 '23 at 14:26
  • 1
    https://zguide.zeromq.org/docs/chapter5/ - slow consumer; https://stackoverflow.com/questions/15327322/fast-producer-and-slow-consumer ; but it's just my guess; try to change the protocol to drop some frames, or always send the newest full frame instead of delta; I don't know how gstreamer works in your config – pptaszni Aug 09 '23 at 15:00
  • since you experience the delay without aruco, aruco is not to blame. please present a [mre] – Christoph Rackwitz Aug 09 '23 at 15:05
  • @ChristophRackwitz I have edited my post, thank you for the reminder. – Guilherme Pinto Aug 09 '23 at 15:24
  • Probably not the problem, but just in case; are you building your code with compiler optimizations *enabled*? – Jesper Juhl Aug 09 '23 at 15:24
  • @JesperJuhl I don't think so, I am using cmake and in the CMakeLists.txt file there is nothing related to what you mentioned – Guilherme Pinto Aug 10 '23 at 09:12
  • 1
    @pptaszni Editing the pipeline to drop frames worked! The delay has been reduced to completely acceptable and it is what I would expect. Thank you so much. If you want to, post an answer so that I can accept it! – Guilherme Pinto Aug 10 '23 at 10:49
  • @GuilhermePinto I think it's better if you post an answer to your own question describing what exactly had to be changed in the pipeline configuration, then we can upvote it. – pptaszni Aug 10 '23 at 10:53
  • @pptaszni Done! thank you. – Guilherme Pinto Aug 10 '23 at 11:00
  • Compiler optimizations enabled is also commonly known as a "Release" build (as opposed to an unoptimized "Debug" build). Release builds can in some cases perform 10 (or even 100) times better than unoptimized debug builds. So I was just wondering if you tested an optimized build since it can make a huge performance difference in some cases. – Jesper Juhl Aug 10 '23 at 11:22
  • @JesperJuhl I will look into that to test further improvements, thank you. – Guilherme Pinto Aug 10 '23 at 12:04

1 Answers1

1

SOLUTION:

I followed @pptaszni (https://stackoverflow.com/users/4165552/pptaszni) 's suggestion of changing the protocol to drop frames and the delay has been reduced to a completely acceptable level and it is what I would expect. I dont really have a reliable way of measuring it, but if I had to guess it is probably around ~200ms with all of the aruco logic included. Here is the updated pipeline function:

std::string gstreamer_pipeline(int capture_width, int capture_height, int display_width, int display_height, int framerate, int flip_method, int sensor_mode)
{
    return "nvarguscamerasrc sensor_mode=" + std::to_string(sensor_mode) + " ! video/x-raw(memory:NVMM), width=(int)" + std::to_string(capture_width) + ", height=(int)" +
           std::to_string(capture_height) + ", framerate=(fraction)" + std::to_string(framerate) +
           "/1 ! nvvidconv flip-method=" + std::to_string(flip_method) + " ! video/x-raw, width=(int)" + std::to_string(display_width) + ", height=(int)" +
           std::to_string(display_height) + ", format=(string)GRAY8 ! videoconvert ! video/x-raw, format=(string)GRAY8 ! appsink drop=true sync=false";
}

Thanks everyone!