-3

What regex code can i use to find an html tag, and then extract the string out of it?

<?php 

$html = "<span class="equipped">360</span>"
$match = preg_match("???", $html, $matches);

?>
Shawn
  • 933
  • 4
  • 18
  • 29
  • first i would recommend to fix your $html string using `'` or escaping the `"` – Book Of Zeus Nov 25 '11 at 18:35
  • This is too ambiguous; what string in particular do you want to find, within what context? Also, your PHP has a syntax error (double quotes in double quotes), and you shouldn't use regex to parse HTML. – Bojangles Nov 25 '11 at 18:35
  • You need to get your string syntax in order first. Writing a regex is simple, as most fixed characters can stay as is. You just need a placeholder for the number. See the PHP manual on regex syntax and examples: http://www.php.net/manual/en/reference.pcre.pattern.syntax.php – mario Nov 25 '11 at 18:36

2 Answers2

0

You should never parse HTML with regular expressions... you will find plenty of explanations here of why you should not do that.

You can take a look at this previous SO post where they discuss various frameworks which allow you to process HTML through PHP such as phpQuery and QueryPath.

Community
  • 1
  • 1
npinti
  • 51,780
  • 5
  • 72
  • 96
-1

As npinti points out, you shouldn't use a regular expression to parse a non-regular language. Instead, you can use PHP's DOMDocument to find the text of any node you want. Here's an example for capturing the <span> element's inner text and a demonstration to show how it works.

$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);

$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
    echo $el->nodeValue . "\n";
}

Demo

Edit: My example shows using a semi-complete HTML document, but DOMDocument will also successfully parse an HTML string such as $html = '<span>Text</span>';, see here.

nickb
  • 59,313
  • 13
  • 108
  • 143
  • DOMDocument is off-topic unless someone has the courtesy to retag the question. It's making search on SO useless if questions are always trashed with the opposite of what was actually asked for (even if OP is clueless). The downvote is mainly for the "regular expression" vs. "non-regular language" fallacy. Modern regexes aren't regular languages. – mario Nov 25 '11 at 18:50
  • I don't understand how that's a fallacy - HTML is irregular, using a regex is the wrong tool for the job. The question was how to extract the inner text of an HTML span element using a regular expression. The first answer is why you shouldn't use a regex. The second is an answer that shows how to do it without a regex. Where's the problem? – nickb Nov 25 '11 at 18:56
  • Two problems. Firstly, [it's not true](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491). Secondly, giving OP a readymade solution is of no educative value. (Might even be worse than honoring his literal request for the magic regex codez plz). Also, it's a dupe. – mario Nov 25 '11 at 19:00
  • mario im trying to learn, im sorry for asking what the regex code would look like. – Shawn Nov 25 '11 at 19:23
  • @nickb - instead of putting html in the string would i be able to put a url? – Shawn Nov 25 '11 at 19:25
  • @Shawn - No, use something like [file_get_contents](http://www.php.net/file_get_contents) to pull the contents of the URL into a string (if [allow_url_fopen](http://www.php.net/allow_url_fopen) is enabled), then pass that to `$doc->loadHTML()`. – nickb Nov 25 '11 at 21:05
  • @mario You're misinterpreting my comment. For the OP's job, a regex is not the correct tool, especially if the desire is to extract the contents of an arbitrary HTML tag. Sure, the example linked is done in Perl, showing it's possible, but as a beginner, it certainly isn't feasible. I also don't understand why a post on SO shouldn't contain a "readymade solution" - I've yet to see an answer say "Yes, it's possible, start here and figure it out". Seems counterintuitive to me. – nickb Nov 25 '11 at 21:09