Problem
So I am fairly new to Computer Vision in general. I am currently trying to calculate a homography by analyzing 2 images. I want to use the homography to correct the perspective of 1 image to match the other. But the matches I am getting are just bad and wrong. So the homographic warp I do is completely off.
Current state
I am using EmguCV for wrapping opencv in C#. I got as far as that my code seems to work "properly".
I load my two images and declare some variables to store calculation artifacts.
(Image<Bgr, byte> Image, VectorOfKeyPoint Keypoints, Mat Descriptors) imgModel = (new Image<Bgr, byte>(imageFolder + "image0.jpg").Resize(0.2, Emgu.CV.CvEnum.Inter.Area), new VectorOfKeyPoint(), new Mat());
(Image<Bgr, byte> Image, VectorOfKeyPoint Keypoints, Mat Descriptors) imgTest = (new Image<Bgr, byte>(imageFolder + "image1.jpg").Resize(0.2, Emgu.CV.CvEnum.Inter.Area), new VectorOfKeyPoint(), new Mat());
Mat imgKeypointsModel = new Mat();
Mat imgKeypointsTest = new Mat();
Mat imgMatches = new Mat();
Mat imgWarped = new Mat();
VectorOfVectorOfDMatch matches = new VectorOfVectorOfDMatch();
VectorOfVectorOfDMatch filteredMatches = new VectorOfVectorOfDMatch();
List<MDMatch[]> filteredMatchesList = new List<MDMatch[]>();
Notice that I use a ValueTuple<Image,VectorOfKeyPoint,Mat>
to store the images directly with their respective Keypoints and Descriptors.
After this is use an ORB detector and BruteForce matcher to detect, describe and match the keypoints:
ORBDetector detector = new ORBDetector();
BFMatcher matcher = new BFMatcher(DistanceType.Hamming2);
detector.DetectAndCompute(imgModel.Image, null, imgModel.Keypoints, imgModel.Descriptors, false);
detector.DetectAndCompute(imgTest.Image, null, imgTest.Keypoints, imgTest.Descriptors, false);
matcher.Add(imgTest.Descriptors);
matcher.KnnMatch(imgModel.Descriptors, matches, k: 2, mask: null);
After this I apply the ratio test and do some further filtering by using a match-distance threshold.
MDMatch[][] matchesArray = matches.ToArrayOfArray();
//Apply ratio test
for (int i = 0; i < matchesArray.Length; i++)
{
MDMatch first = matchesArray[i][0];
float dist1 = matchesArray[i][0].Distance;
float dist2 = matchesArray[i][1].Distance;
if (dist1 < ms_MIN_RATIO * dist2)
{
filteredMatchesList.Add(matchesArray[i]);
}
}
//Filter by threshold
MDMatch[][] defCopy = new MDMatch[filteredMatchesList.Count][];
filteredMatchesList.CopyTo(defCopy);
filteredMatchesList = new List<MDMatch[]>();
foreach (var item in defCopy)
{
if (item[0].Distance < ms_MAX_DIST)
{
filteredMatchesList.Add(item);
}
}
filteredMatches = new VectorOfVectorOfDMatch(filteredMatchesList.ToArray());
Disabling any of these filter methods isn't really making my results much better or worse (just keeping all matches) but they seem to make sense so I keep them.
In the end I calculate my homography from the found and filtered matches then warp the image with this homography and draw some debug images:
Mat homography = Features2DToolbox.GetHomographyMatrixFromMatchedFeatures(imgModel.Keypoints, imgTest.Keypoints, filteredMatches, null, 10);
CvInvoke.WarpPerspective(imgTest.Image, imgWarped, homography, imgTest.Image.Size);
Features2DToolbox.DrawKeypoints(imgModel.Image, imgModel.Keypoints, imgKeypointsModel, new Bgr(0, 0, 255));
Features2DToolbox.DrawKeypoints(imgTest.Image, imgTest.Keypoints, imgKeypointsTest, new Bgr(0, 0, 255));
Features2DToolbox.DrawMatches(imgModel.Image, imgModel.Keypoints, imgTest.Image, imgTest.Keypoints, filteredMatches, imgMatches, new MCvScalar(0, 255, 0), new MCvScalar(0, 0, 255));
//Task.Factory.StartNew(() => ImageViewer.Show(imgKeypointsModel, "Keypoints Model"));
//Task.Factory.StartNew(() => ImageViewer.Show(imgKeypointsTest, "Keypoints Test"));
Task.Factory.StartNew(() => ImageViewer.Show(imgMatches, "Matches"));
Task.Factory.StartNew(() => ImageViewer.Show(imgWarped, "Warp"));
tl;dr: ORBDetector->BFMatcher->FilterMatches->GetHomography->WarpPerspective
Output
Test whether projection is going wrong
Using crosscheck when matching
Original images are 2448x3264 each and scaled by 0.2 before running any calculations on them.
Question
Basically it's as simple yet complex as: What am I doing wrong? As you can see from the example above my method of detecting features and matching them just seem to work extremely poorly. So I am asking if someone can maybe spot a mistake in my code. Or give advice on why my results are so bad when there are hundreds of example out on the internet showing how it works and how "easy" it is.
What I tried so far:
- Scaling of the input images. I generally get better results if I scale them down quite a bit.
- Detect more or less features. Default is 500 which is used currently. Increasing or decreasing this number didn't really make my results better.
- Various numbers of k but anything else except k = 2 doesn't make any sense to me as I don't know how to modify the ratio test for k > 2.
- Varying filter parameters like using a ratio of 0.6-0.9 for the ration test.
- Using different pictures: QR-code, Silhouette of a dinosaur, some other random objects I had lying around my desk.
- Varying the re-projection threshold from 1-10 with any changes in the result
- Verifying that the projection itself is not faulty. Feeding the algorithm with the same image for model and test produce the homography and warp the image with the homography. Image should not change. This worked as expected (see example image 2).
- Image 3: Using crosscheck when matching. Looks a lot more promising but still not really what I am expecting.
- Using other distance Methods: Hamming, Hamming2, L2Sqr (others are not supported)
Examples I used:
https://www.learnopencv.com/image-alignment-feature-based-using-opencv-c-python/ (where I got the main structure of my code)
Original Images: The original images can be downloaded from here: https://drive.google.com/open?id=1Nlqv_0sH8t1wiH5PG-ndMxoYhsUbFfkC
Further Experiments since asking
So I did some further research after asking. Most changes are already included above but I wanted to make a separate section for this one.
So after running into so many problems and seemingly nowhere to start I decided to google up the original paper on ORB. After this I decided to try and replicate some of their results. Upon trying this I realised that even I try to match the match image rotate by a degree the matches seem to look fine but the transformation completely breaks down.
Is it possible that my method of trying to replicate the perspective of an object is just wrong?
MCVE
https://drive.google.com/open?id=17DwFoSmco9UezHkON5prk8OsPalmp2MX (without packages, but nuget restore will be enough to get it to compile)