22

I'm trying to figure out what technologies I would need to process images for characters.

Specifically, in this example, I need to extract the hashtag that is circled. You can see it here:

enter image description here

Any implementations would be of great assistance.

karlphillip
  • 92,053
  • 36
  • 243
  • 426
somejkuser
  • 8,856
  • 20
  • 64
  • 130
  • I don't understand. Do you want the extract the different hashtags? you can do that using OCR. Do you want to detect the circle enclosing this certain hashtag? – GilLevi Dec 07 '13 at 19:00
  • 1
    @GilLevi Id like to detect the circle enclosing the hashtag and then extract that hashtag thats within the circle. – somejkuser Dec 08 '13 at 23:54
  • 1
    I'm not sure. If it was a rectangle, you could just apply thresholding + line detection (hough transform or line segment detector) and try to find 4 lines that cross with 90 degrees angle. – GilLevi Dec 09 '13 at 15:07
  • Do you want to recognize the letters F, O A, and M? or would you just want to extract the frame with FOAM inside while dropping all the other region with letters? – lennon310 Dec 11 '13 at 15:46
  • @lennon310 yes i want to recognize the letters FOAM because they exist within the circle. – somejkuser Dec 12 '13 at 15:39
  • never use click here in link... – ViliusL Dec 13 '13 at 12:04
  • I clicked through from your image to the website that's hosting it. Did you read the bit that says [Nike reserves the right to ban any participants who make threats, harass or attempt to cheat or abuse the process by any means, **including use of programs or scripts that provide them an unfair advantage**.](http://help-us.nikeinc.com/app/answers/detail/a_id/22897/kw/twitter). Aren't you just wasting your time here? – r3mainer Dec 16 '13 at 01:56
  • He probably is, but now that I've answer it... I want my reward! :D – karlphillip Dec 17 '13 at 02:28

4 Answers4

54

It is possible to solve this problem with OpenCV + Tesseract

though I think there might be easier ways. OpenCV is an open source library used to build computer vision applications and Tesseract is an open source OCR engine.

Before we start, let me clarify something: that is not a circle, its a rounded rectangle.

I'm sharing the source code of the application that I wrote to demonstrate how the problem can be solved, as well as some tips on what's going on. This answer is not supposed to educate anybody on digital image processing and it is expected the reader to have a minimal understanding on this field.

I will describe very briefly what the larger sections of the code does. Most of the next chunk of code came from squares.cpp, a sample application that is shipped with OpenCV to detect squares in images.

#include <iostream>
#include <vector>

#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>

// angle: helper function.
// Finds a cosine of angle between vectors from pt0->pt1 and from pt0->pt2.
double angle( cv::Point pt1, cv::Point pt2, cv::Point pt0 )
{
    double dx1 = pt1.x - pt0.x;
    double dy1 = pt1.y - pt0.y;
    double dx2 = pt2.x - pt0.x;
    double dy2 = pt2.y - pt0.y;
    return (dx1*dx2 + dy1*dy2)/sqrt((dx1*dx1 + dy1*dy1)*(dx2*dx2 + dy2*dy2) + 1e-10);
}

// findSquares: returns sequence of squares detected on the image.
// The sequence is stored in the specified memory storage.
void findSquares(const cv::Mat& image, std::vector<std::vector<cv::Point> >& squares)
{  
    cv::Mat pyr, timg;

    // Down-scale and up-scale the image to filter out small noises
    cv::pyrDown(image, pyr, cv::Size(image.cols/2, image.rows/2));
    cv::pyrUp(pyr, timg, image.size());

    // Apply Canny with a threshold of 50
    cv::Canny(timg, timg, 0, 50, 5);

    // Dilate canny output to remove potential holes between edge segments
    cv::dilate(timg, timg, cv::Mat(), cv::Point(-1,-1));

    // find contours and store them all as a list 
    std::vector<std::vector<cv::Point> > contours;           
    cv::findContours(timg, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);

    for( size_t i = 0; i < contours.size(); i++ ) // Test each contour
    {
        // Approximate contour with accuracy proportional to the contour perimeter
        std::vector<cv::Point> approx;   
        cv::approxPolyDP(cv::Mat(contours[i]), approx, cv::arcLength(cv::Mat(contours[i]), true)*0.02, true);

        // Square contours should have 4 vertices after approximation
        // relatively large area (to filter out noisy contours)
        // and be convex.
        // Note: absolute value of an area is used because
        // area may be positive or negative - in accordance with the
        // contour orientation
        if( approx.size() == 4 &&
            fabs(cv::contourArea(cv::Mat(approx))) > 1000 &&
            cv::isContourConvex(cv::Mat(approx)) )
        {
            double maxCosine = 0;

            for (int j = 2; j < 5; j++)
            {
                // Find the maximum cosine of the angle between joint edges
                double cosine = fabs(angle(approx[j%4], approx[j-2], approx[j-1]));
                maxCosine = MAX(maxCosine, cosine);
            }

            // If cosines of all angles are small
            // (all angles are ~90 degree) then write quandrange
            // vertices to resultant sequence
            if( maxCosine < 0.3 )
                squares.push_back(approx);
        }
    }         
}


// drawSquares: function draws all the squares found in the image
void drawSquares( cv::Mat& image, const std::vector<std::vector<cv::Point> >& squares )
{
    for( size_t i = 0; i < squares.size(); i++ )
    {
        const cv::Point* p = &squares[i][0];
        int n = (int)squares[i].size();
        cv::polylines(image, &p, &n, 1, true, cv::Scalar(0,255,0), 2, CV_AA);
    }

    cv::imshow("drawSquares", image);
}

Ok, so our program begins at:

int main(int argc, char* argv[])
{
// Load input image (colored, 3-channel)
cv::Mat input = cv::imread(argv[1]);
if (input.empty())
{
    std::cout << "!!! failed imread()" << std::endl;
    return -1;
}   

// Convert input image to grayscale (1-channel)
cv::Mat grayscale = input.clone();
cv::cvtColor(input, grayscale, cv::COLOR_BGR2GRAY);
//cv::imwrite("gray.png", grayscale);

What grayscale looks like:

// Threshold to binarize the image and get rid of the shoe
cv::Mat binary;
cv::threshold(grayscale, binary, 225, 255, cv::THRESH_BINARY_INV);
cv::imshow("Binary image", binary);
//cv::imwrite("binary.png", binary);

What binary looks like:

// Find the contours in the thresholded image
std::vector<std::vector<cv::Point> > contours;
cv::findContours(binary, contours, cv::RETR_LIST, cv::CHAIN_APPROX_SIMPLE);

// Fill the areas of the contours with BLUE (hoping to erase everything inside a rectangular shape)
cv::Mat blue = input.clone();      
for (size_t i = 0; i < contours.size(); i++)
{
    std::vector<cv::Point> cnt = contours[i];
    double area = cv::contourArea(cv::Mat(cnt));               

    //std::cout << "* Area: " << area << std::endl; 
    cv::drawContours(blue, contours, i, cv::Scalar(255, 0, 0), 
                     CV_FILLED, 8, std::vector<cv::Vec4i>(), 0, cv::Point() );         
}       

cv::imshow("Countours Filled", blue);  
//cv::imwrite("contours.png", blue);  

What blue looks like:

// Convert the blue colored image to binary (again), and we will have a good rectangular shape to detect
cv::Mat gray;
cv::cvtColor(blue, gray, cv::COLOR_BGR2GRAY);
cv::threshold(gray, binary, 225, 255, cv::THRESH_BINARY_INV);
cv::imshow("binary2", binary);
//cv::imwrite("binary2.png", binary);

What binary looks like at this point:

// Erode & Dilate to isolate segments connected to nearby areas
int erosion_type = cv::MORPH_RECT; 
int erosion_size = 5;
cv::Mat element = cv::getStructuringElement(erosion_type, 
                                            cv::Size(2 * erosion_size + 1, 2 * erosion_size + 1), 
                                            cv::Point(erosion_size, erosion_size));
cv::erode(binary, binary, element);
cv::dilate(binary, binary, element);
cv::imshow("Morphologic Op", binary); 
//cv::imwrite("morpho.png", binary);

What binary looks like at this point:

// Ok, let's go ahead and try to detect all rectangular shapes
std::vector<std::vector<cv::Point> > squares;
findSquares(binary, squares);
std::cout << "* Rectangular shapes found: "  << squares.size() << std::endl;

// Draw all rectangular shapes found
cv::Mat output = input.clone();
drawSquares(output, squares);
//cv::imwrite("output.png", output);

What output looks like:

Alright! We solved the first part of the problem which was finding the rounded rectangle. You can see in the image above that the rectangular shape was detected and green lines were drawn over the original image for educational purposes.

The second part is much easier. It begins by creating a ROI (Region of Interested) in the original image so we can crop the image to the area inside the rounded rectangle. Once this is done, the cropped image is saved on the disk as a TIFF file, which is then feeded to Tesseract do it's magic:

// Crop the rectangular shape
if (squares.size() == 1)
{    
    cv::Rect box = cv::boundingRect(cv::Mat(squares[0]));
    std::cout << "* The location of the box is x:" << box.x << " y:" << box.y << " " << box.width << "x" << box.height << std::endl;

    // Crop the original image to the defined ROI
    cv::Mat crop = input(box);
    cv::imshow("crop", crop);
    //cv::imwrite("cropped.tiff", crop);
}
else
{
    std::cout << "* Abort! More than one rectangle was found." << std::endl;
}

// Wait until user presses key
cv::waitKey(0);

return 0;
}

What crop looks like:

When this application finishes it's job, it creates a file named cropped.tiff on the disk. Go to the command-line and invoke Tesseract to detect the text present on the cropped image:

tesseract cropped.tiff out

This command creates a file named out.txt with the detected text:

enter image description here

Tesseract has an API that you can use to add the OCR feature into your application.

This solution is not robust and you will probably have to do some changes here and there to make it work for other test cases.

karlphillip
  • 92,053
  • 36
  • 243
  • 426
  • isn't the rectangular added by the author? Is that included in the original hashtag? Without the rectangular, it may be challenge to differentiate the FOAM and other letters – lennon310 Dec 17 '13 at 02:32
  • The orange rectangle was indeed added by the author. The other one which this application detects I believe to be part of the original image. At least I hope so. – karlphillip Dec 17 '13 at 02:34
  • 1
    Yes that would simplify the problem. Yet this is a very nice and effective approach. Good job, Thank you! – lennon310 Dec 17 '13 at 02:39
  • Did you receive the full bounty? I've been out of town due to an emergency and I noticed that I missed the bounty signoff. – somejkuser Dec 23 '13 at 02:24
  • @jkushner Hi. When the period of the bounty expires, the site automatically gives 50% of the reward to the most voted answer. So to answer your question, I only received 50%. But that's not a problem and I hope the answer helped you. See you around! – karlphillip Dec 23 '13 at 02:46
  • @karlphillip how would you identify the numbers in multi-dimensional array of Contours in Credit card scanning ? I am not able regognize the region so only card number remains on sample image so later on can pass it to Tesseract for recognizing the numbers. Can you help me with it , since I am not able differentiate between card logo, numbers & text.Thanks!!! – Ajay Sharma May 21 '14 at 09:20
  • @AjaySharma Keep an eye open for this thread: http://stackoverflow.com/q/23706394/176769 – karlphillip May 22 '14 at 22:10
  • @karlphillip Thanks, I will look out on this post. Hope things would be clear now :) – Ajay Sharma May 23 '14 at 05:26
  • Thank you Sir :) Excellent answer :) Looking for this type of detailed example. :) Now i found it. :) – Muhammad Hashim Shafiq May 21 '16 at 06:13
3

There is a few alternatives: Java OCR implementation

They mention the next tools:

And a few others.

This list of links can also be useful: http://www.javawhat.com/showCategory.do?id=2138003

Generally this kind of task requires lots of trial and testing. Probably the best tool depends much more the profile of your input data than anything else.

Community
  • 1
  • 1
Lajos Veres
  • 13,595
  • 7
  • 43
  • 56
3

You can check this article : http://www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C

Contour Analysis Demo

It comes with math theory and implementation on C# (unfortunately, but there not that much to rewrite if you decide to implement it in java ) + opencv. So you will have to use Visual Studio and rebuild against your opencv version if you would like to test it, but it worth it.

Dabo
  • 2,371
  • 2
  • 18
  • 26
0

OCR works well with scanned document. What you are referring to is text detection in general images, which requires other techniques (sometimes OCR is used as part of the flow)

I'm not aware of any "production ready" implementations.

for general information try google scholar with: "text detection in images"

a specific method that worked well for me is 'stroke width transform' (SWT) it's not hard to implement, and I believe that there also some implementations available online.

Ophir Yoktan
  • 8,149
  • 7
  • 58
  • 106