1

I'd like to make a function that returns content between tags (either the whole string or a specified number of letters after the opening tag) Linear code is below:

$tag='<body>';
//case1
$source=substr($source,strpos($source,$tag)+strlen($tag));
$sub=substr($source,0,strpos($source,'<'));
//case2
$source=substr($source,strpos($source,$tag)+strlen($tag));
$sub=substr($source,0,3);

The function will be accepting 3 parameters: the source code, the specified tag and the substring length (for case 2) and will return 2 variables: the trimmed source and the substring. So basicaly I want to have a function like this:

function p($source,$tag,$len) {
  $source=substr($source,strpos($source,$tag)+strlen($tag));
  if(isset($len)) $sub=substr($source,0,$len);
  else $sub=substr($source,0,strpos($source,'<'));
  $ret=array();
  $ret[0]=$source;
  $ret[1]=$sub;
  return $ret;
}
//
$source=p($source,'<strong>')[0];
$sub1=p($source,'<strong>')[1];
$source=p($source,'<p>',100)[0];
$sub2=p($source,'<p>',100)[1];
user965748
  • 2,227
  • 4
  • 22
  • 30
  • What language is this? Please retag with that language. – Matt Fenwick Nov 16 '11 at 15:17
  • Maybe use an XML parser? http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html – Maxime Pacary Nov 16 '11 at 18:22
  • @FrostyZ I don't need to parse all the code, just chosen tags and 1 function will be enough. – user965748 Nov 16 '11 at 18:30
  • 1
    @user965748 But you must parse chosen tags *over all the HTML code*, right? Maybe a look at this one: http://simplehtmldom.sourceforge.net/ ? + related question (IMO): http://stackoverflow.com/questions/3650125/how-to-parse-html-with-php – Maxime Pacary Nov 16 '11 at 18:34
  • @FrostyZ The links you put in here are surely useful, but my needs are quite specific, so the solution with a function seems to me to be better. However it's right, that I need to go through all the code. In fact I have 2 while loops- the first one searches for paragraphs and the one inside proccesses the content of the paragraph. – user965748 Nov 16 '11 at 19:18
  • @user965748 I've posted those links because I think that they will help you to address your specific needs, in a simpler way than what you're trying. – Maxime Pacary Nov 16 '11 at 19:26
  • @FrostyZ I'll study it more thoroughly then. Do you know if it has at least some real advantage like lower memory consumption or better speed? – user965748 Nov 16 '11 at 19:46
  • @user965748 As said in the second answer here http://stackoverflow.com/questions/3650125/how-to-parse-html-with-php "**Why you shouldn't and when you should use regular expressions?**", cases exist where using regular expressions can be appropriate (doing very simple tasks, and possibly improve perfs) – Maxime Pacary Nov 16 '11 at 19:52
  • simplehtmldom is a very useful little script and I recommend it! – Marshall Davis Nov 16 '11 at 20:26

1 Answers1

0
function get_inner_html( $source, $tag, $length = NULL )
{
    $closing_tag = str_replace( '<', '</', $tag ); // HTML closing tags are opening tags with a preceding slash
    $closing_tag_length = strlen( $closing_tag );
    $tag_length = strlen( $tag ); // Will need this for offsets
    $search_offset = 0; // Start at the start
    $tag_internals = FALSE;
    while ( strpos( $source, $tag, $search_offset ) ) // Keep searching for tags until we find no more
    {
        $tag_position = strpos( $source, $tag, $search_offset ); // Next occurrence position
        $tag_end = strpos( $source, $closing_tag, $search_offset ); // Next closing occurrence
        if ( $length == NULL )
        {
            $substring_length = $tag_end - ($tag_position + $tag_length);
        } else
        {
            $substring_length = $length;
        }
        $substring = substr( $source, $tag_position + $tag_length, $substring_lenth );
        $tag_internals[] = $substring;
        $search_offset = $tag_end + $closing_tag_length; // The next iteration of loop will start at this position, effectively trimming off previous locations
    }
    return $tag_internals; // Returns an array of findings for this tag or false if tag not found
}

Your question says the full string or a subset based on the length passed. If you need both options you'll need to remove the if and do a second substr to pull out the full string. Probably saving that to another array and returning an array of two arrays, one of the full strings and one of trimmed strings.

I didn't run this code, so some bugs may exist (read: do exist) and it only works for the most basic of tags. If any of your tags have attributes you'll need to trim those out and adjust the closing tag calculations to prevent having long closing tags that don't exist.

This is a simple example, but bear in mind that a lot of the PHP string functions are kinda piggish and not suited for processing long strings (like full HTML files) and line by line versus file as string parsing may work better. I stand by everyone who says write or use an existing parser as you are likely to get a better results.

Marshall Davis
  • 3,337
  • 5
  • 39
  • 47