1

I'm trying to extract text from an image with google vision api, it works. But I just want to detect part of the image to get certain text.

this is the image I used

My image

I just want to extract all the text from maybank2u.com until From Account: I know there are some tutorials to do this trick by using block but those tutorials are different programming languages.

My code:

<div class="row">
    <div class="col-12">
        <ol>
            <?php foreach ($text as $key => $texts): ?> 
                <li><h6> <?php echo ucfirst($texts->info()['description']) ?></h6><<br><br> 
                </li>
            <?php endforeach ?>
        </ol>
    </div>
</div>

This code will getting all the text from image

Output: enter image description here

2 Answers2

1

The code below works for me. I have one php file, test.php and one image file /images/UUIPXl.png.

To get each line of text, I iterate the text annotations from Google Vision, and create an array of row items. Each of these has an x position and a text value.

I then sort each row by x position and concatenate to create a line of text.

Finally we stop once we get the final desired line of text.

I get a result like so:

  • maybank2u.com
  • Open BillPayment
  • Status: Successful
  • Reference number: 2950211545
  • Transaction date: 01 Feb 2016 13:09:17
  • Amount: RM100.00
  • From Account 564155051577 WCA

The php code:

<?php 

    require 'vendor/autoload.php';
    use Google\Cloud\Vision\VisionClient;

    $config = ["keyFile" => json_decode(file_get_contents("./APIKey.json"), true) ];
    $vision = new VisionClient($config);

    $image = $vision->image(
        fopen('./images/UUIPXl.png', 'r'),
        ['TEXT_DETECTION']
    );

    $textAnnotations = $vision->annotate($image)->text();
    $rows = [];

    // Function used to sort our lines.
    function sortProc($a, $b)
    {
        if ($a["x"] === $b["x"]) {
            return 0;
        }
        return ($a["x"] < $b["x"]) ? -1 : 1;
    }

    // Remove first row (complete text).
    array_shift($textAnnotations);

    // We should calculate this, use a reasonable value to begin with.
    $lineHeight = 8;

    foreach ($textAnnotations as $text) {
        $key = round(((double)($text->info()["boundingPoly"]["vertices"][0]["y"]))/$lineHeight);
        $x = (int)$text->info()["boundingPoly"]["vertices"][0]["x"];
        $value = ["x" => $x, "text" => $text->description()];
        if (!isset($rows[$key])) {
            $rows[$key] = [];
        }
        $rows[$key][] = $value;
    }

    $text = [];
    foreach ($rows as $key => $value) {
        // Sort by x value.
        usort($value, "sortProc");

        // Concatenate each line
        $result = array_reduce($value, function($acc, $elem) {
            $acc .= " " . $elem["text"];
            return $acc;
        }, "");

        $text[] = $result;

        // Stop when we get here!
        if (preg_match("/from account/i", $result)) {
            break;
        }
    }

?>

<div class="row" style="padding: 20px;">
    <div class="col-12">
        <ul>
            <?php foreach ($text as $row): ?> 
                <li><h3> <?php echo ucfirst($row) ?></h3></li>
            <?php endforeach ?>
        </ul>
    </div>
</div>
Terry Lennox
  • 29,471
  • 5
  • 28
  • 40
  • thx for trying, may I know `if (preg_match("/from account/i", $result)) ` whats the `i` in this line for ? –  Sep 09 '19 at 12:18
  • 1
    I tried your code, it works but I can get the `:` in the line `Reference number: 2950211545` and `From Account 564155051577 WCA` –  Sep 09 '19 at 12:21
  • Oh the trailing /i is for a case insensitive match. I think in PHP this is often done like #from account#i. – Terry Lennox Sep 09 '19 at 12:23
  • The Reference number line should include the ":" character, the "From Account" line seems to be a bit of a problem, I think the ":" character is simply not being recognised. – Terry Lennox Sep 09 '19 at 12:34
  • The image does have a relatively low resolution (541 x 466), this may explain the fact that the text is not 100% accurate. – Terry Lennox Sep 09 '19 at 13:21
0

If you only want to limit the output and its every time the same string that should stop the execution, then do the following:

<div class="row">
    <div class="col-12">
        <ol>
            <?php foreach ($text as $key => $texts): ?> 
                <?php if (strpos($texts->info()['description'], 'From Account') !== false) break; ?>
                <li><h6> <?php echo ucfirst($texts->info()['description']) ?></h6><<br><br> 
                </li>
            <?php endforeach ?>
        </ol>
    </div>
</div>

Explanation:
If $texts->info()['description'] contains the text From Account it ends the execution of the foreach loop through break. If you need to check for multiple keywords read this.

An alternative solution would be to crop the image with imagecrop() before sending it to the API. But for this you need to be sure that it never changes the size / position of the texts.

P.S. are you sure everyone should see those private data in your screenshot?

Update1
As you asked. This would be the same code but using the alternative syntax for control structures:

<div class="row">
    <div class="col-12">
        <ol>
            <?php foreach ($text as $key => $texts): ?> 
                <?php if (strpos($texts->info()['description'], 'From Account') !== false): ?>
                <?php break; ?>
                <?php endif; ?>
                <li><h6> <?php echo ucfirst($texts->info()['description']) ?></h6><<br><br> 
                </li>
            <?php endforeach ?>
        </ol>
    </div>
</div>

Maybe this solves your problem as the same page includes this note:

Mixing syntaxes in the same control block is not supported.

Update2

After you updated your question its more clear now. The output does not contain one element per text line. Instead it contains multiple lines of texts. Because of that my first code did not echo anything as it finds From Account in the very first array element.

Because of that we need to search for the string From Account and cut the text line:

<div class="row">
    <div class="col-12">
        <ol>
            <?php foreach ($text as $key => $texts): ?> 
                <?php
                $text = $texts->info()['description'];
                // search for string
                $pos = strpos($texts->info()['description'], 'From Account');
                if ($pos !== false) {
                    // if the string was found cut the text
                    $text = substr($text, 0, $pos);
                }
                ?>
                <li><h6> <?php echo $text ?></h6><<br><br> 
                </li>
            <?php endforeach ?>
        </ol>
    </div>
</div>

Optionally you could add this before <?php endforeach ?> to skip all following array elements:

                <?php
                if ($pos !== false) {
                    break;
                }
                ?>

Note: @TerryLennox uses preg_match to find From Account. There is no difference between this and using strpos (most prefer avoiding regex). But his answer contains another good tip. He uses the text position information to add the text line by line to a new array. This could be really useful depending on your targets how to display/store the text.

mgutt
  • 5,867
  • 2
  • 50
  • 77
  • thx but the code is not working, still getting all the text from the image –  Sep 06 '19 at 15:59
  • the image I just download from goolge image, so it is not ,private data haha –  Sep 06 '19 at 16:00
  • @overflowstack Does `$texts->info()['description']` contain the text `From Account` or has the string been splitted or the whitespace is not a whitespace? Or maybe you need to use `stripos` to overcome case sensitivity: https://www.php.net/manual/function.stripos.php – mgutt Sep 07 '19 at 11:35
  • Here you can see that the code works: http://sandbox.onlinephpfunctions.com/code/43f4dc62ca44c5e73ac1c8b8e0ee5a0db22a7810 – mgutt Sep 07 '19 at 11:41
  • i updated my question and added output, please have a look on that –  Sep 09 '19 at 08:11
  • i know the code should works but when I use your code, i get a blank result, it didnt not extract any text, but when I change your code to `From Account:` it get all the text including the text below than `From account` –  Sep 09 '19 at 08:19
  • if its blank, maybe it contains the string in the first result?! Use `print_r($texts->info()['description'])` to check the content of your array. Or is the page really blank without any html code? This means you have a PHP error. Then you should enable error reporting. – mgutt Sep 09 '19 at 08:36
  • i tried `print_r` and the content is same with the first result –  Sep 09 '19 at 09:04
  • no, the page is not blank with or without html code btw why you dont have `` ? –  Sep 09 '19 at 09:07
  • @overflowstack endif is the alternative syntax writing an if condition. I used the general way or to be exact I used a shorthand of the general way of writing an if condition. You can try the alternative posted as update1 in my answer, but I do not think this will change something. – mgutt Sep 09 '19 at 20:19
  • @overflowstack Ok see the problem after checking your updated question and output. The problem is that you receive the complete text as only one array element. I will update my answer. Wait for Update2. – mgutt Sep 09 '19 at 20:21