0

I am writing a script in Perl, but Im just a beginner. This program downloads html page and tries to find phrases bounded by tags. I attached a code below, when I check it, there's no errors but it does nothing (no print out). So please can anybody give me some advice what can be wrong?

open ':std', ':encoding(UTF-8)';

my $s = get("xxx.html");

foreach my $line (split(/\n/,$s)) {

  if (m,<>(.*?)<>,g) {

    if(eof()) {
        close(FILE);    }

     print "$1\n";
     last if eof();
        }    
}
Nagaraju
  • 1,853
  • 2
  • 27
  • 46
LuBoB
  • 1
  • 1
  • Paired tags are actually quite tricky to get correct regular expressions for. Your regex `m,<>(.*?)<>,g` will only find content between tags looking *exactly* like this: `<>` - there probably are none in the HTML. – Neil Slater May 11 '13 at 11:21
  • I hve between tags letter "b" so it is specific defined, problem isn't in this .... – LuBoB May 11 '13 at 12:34

2 Answers2

3

I do spot numerous problems.

  1. if (//g) makes no sense and can cause actual (subtle) problems. Remove the g.
  2. You check eof() (twice!) without ever using <>. huh?
  3. You close file handle FILE, yet you never opened any such file handle.
  4. You close file handle FILE after checking if a different file handle has reached eof.
  5. You say your code doesn't do anything, yet you didn't bother to check if get returned something other than undef.

By the way, always use use strict; use warnings;. Not sure if you did or not.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1. when I remove "g" - appear warning "Search pattern not terminated" ... 2. now I have eof() in "# comment" and nothing changed ... 3. and "use strict" I have – LuBoB May 11 '13 at 11:59
  • 1. You removed more than `g` if you get that error and you didn't before. 2. That doesn't mean it made any sense whatever to have it. 3. Well, that's only half of the recommendation. – ikegami May 11 '13 at 12:09
  • 1. I remove only ",g" nothing more .... 2. I know that eof() doesnt make any sence because I use only variables but I donw know how to rewrite now.... 3. now I write "use warnings" and it print out some weird warning :/ – LuBoB May 11 '13 at 12:23
  • 3
    @LuBoB Removing the comma `,` before the `g` was one of the problems. In Perl the `m` operator is followed by a character that marks the beginning of the pattern, the same character marks the end of the pattern. So in `m,abc,g` the start and end of the pattern is marked by the `,` characters. More commonly `/` is used as in `m/abc/` and in that case the `m` can also be omitted. Perl also allows the bracketing characters ie ** () [] {} <> ** to be used in pairs, eg `m(abc)`. See also http://stackoverflow.com/questions/5770590/which-characters-can-be-used-as-regular-expression-delimiters – AdrianHHH May 11 '13 at 13:51
0

You can use an XML module(XML::Parser) available over here It grabs the text between the tags.

Nagaraju
  • 1,853
  • 2
  • 27
  • 46
  • could you help with syntax with XML - what would be changed in code ?? ... I dont have much time a becasue Im new in Perl it could take a lot of time... – LuBoB May 11 '13 at 12:13
  • There is an example in the link which I had specified ,its not much difficult to understand. – Nagaraju May 12 '13 at 12:47