0

Sorry im new in perl and cannot find a similar answer.

html file

<div class="user_rating">
.
.
<span class="genre">
.
.
.
<span class="genre">
.
.
.
<span class="genre">
.
.
.
<span class="genre">

perl file

$content =~ /<div class="user_rating">(.*)<span class="genre">/gs;
$empty = $1;

this $empty variable contains information from <div class="user_rating"> to the last <span class="genre">.

But i just want the information from <div class="user_rating"> to the first <span class="genre">. how should i modify my code? i know it is a regular expression problem.

Any help plz...

user23256
  • 53
  • 2
  • 9
  • 4
    If you are going to do a lot of HTML parsing, look into something like `HTML::TreeBuilder` (http://search.cpan.org/~cjm/HTML-Tree-5.03/lib/HTML/TreeBuilder.pm), which will parse the HTML for you. A regex is certainly a useful quick-and-dirty solution for tasks like this, but it is not a robust way of processing HTML in general. – dan1111 Oct 23 '12 at 09:07
  • 3
    [Don't try to parse HTML with regexps](http://stackoverflow.com/a/1732454/470535) yourself, use a [HTML parser](http://search.cpan.org/dist/HTML-Parser/Parser.pm) instead. – dgw Oct 23 '12 at 09:09

1 Answers1

4

Modify your regexp, because .* is greedy.

$content =~ /<div class="user_rating">(.*?)(<span class="genre">){1}/gs;
zb226
  • 9,586
  • 6
  • 49
  • 79
Pavel Vlasov
  • 3,455
  • 21
  • 21
  • 3
    @user1767718 Welcome on SO! If this answer worked for you, you may want to *accept* it as well. But also consider the parser hints in the question's comments :) – memowe Oct 23 '12 at 09:53