I need to extract captcha from url and recognised it with Tesseract. My code is:
#!/usr/bin/perl -X
###
$user = 'user'; #Enter your username here
$pass = 'pass'; #Enter your password here
###
#Server settings
$home = "http://perltest.adavice.com";
$url = "$home/c/test.cgi?u=$user&p=$pass";
###Add code here!
#Grab img from HTML code
#if ($html =~ /<img. *?src. *?>/)
#{
# $img1 = $1;
#}
#else
#{
# $img1 = "";
#}
$img2 = grep(/<img. *src=.*>/,$html);
if ($html =~ /\img[^>]* src=\"([^\"]*)\"[^>]*/)
{
my $takeImg = $1;
my @dirs = split('/', $takeImg);
my $img = $dirs[2];
}
else
{
print "Image not found\n";
}
###
die "<img> not found\n" if (!$img);
#Download image to server (save as: ocr_me.img)
print "GET '$img' > ocr_me.img\n";
system "GET '$img' > ocr_me.img";
###Add code here!
#Run OCR (using shell command tesseract) on img and save text as ocr_result.txt
system("tesseract ocr_me.img ocr_result");
print "GET '$txt' > ocr_result.txt\n";
system "GET '$txt' > ocr_result.txt";
###
die "ocr_result.txt not found\n" if (!-e "ocr_result.txt");
# check OCR results:
$txt = 'cat ocr_result.txt';
$txt =~ s/[^A-Za-z0-9\-_\.]+//sg;
$img =~ s/^.*\///;
print `echo -n "file=$img&text=$txt" | POST "$url"`;
As you see I`m trying extract img src tag. This solution did not work for me ($img1) use shell command tesseract in perl script to print a text output. Also I used adopted version of that solution($img2) How can I extract URL and link text from HTML in Perl?.
If you need HTMLcode from that page, here is:
<html>
<head>
<title>Perl test</title>
</head>
<body style="font: 18px Arial;">
<nobr>somenumbersimg src="/JJ822RCXHFC23OXONNHR.png"
somenumbers<img src="/captcha/1533030599.png"/>
somenumbersimg src="/JJ822RCXHFC23OXONNHR.png" </nobr><br/><br/><form method="post" action="?u=user&p=pass">User: <input name="u"/><br/>PW: <input name="p"/><br/><input type="hidden" name="file" value="1533030599.png"/>Text: <input name="text"></br><input type="submit"></form><br/>
</body>
</html>
I got error that image not found. My problem is wrong regular expression, as I think.I can not install any modules such as HTTP::Parser or similar