24

How can I check if PHP string contents contain any HTML contents?

I'm not good with Regular Expressions so I would like to have a function named "is_html" to check this. :) thank you!

Cilan
  • 13,101
  • 3
  • 34
  • 51

6 Answers6

35

If you want to test if a string contains a "<something>", (which is lazy but can work for you), you can try something like that :

function is_html($string)
{
  return preg_match("/<[^<]+>/",$string,$m) != 0;
}
nico
  • 1,138
  • 9
  • 17
  • You know of anyway to represent exactly this expression without a regex? I'm using the same expression, and curious of the difference of regex vs. non-regex. – onassar Feb 12 '13 at 07:27
  • 1
    Simple, classy, nice hack, I like. – Kzqai Apr 23 '13 at 16:53
  • 10
    @IanWood - not really. Take for example `I <3 PHP` - if checking for html using strip_tgs like this `strlen(strip_tags($string)) !== strlen($string)`, it will incorrectly determine that it contains html, when in fact it doesn't. I'm not saying that this answer is the best way to determine if the string contains html, but it is definitely not overkill. – buggedcom Aug 19 '13 at 10:58
17

Instead of using regex (like the other suggestions here) I use the following method:

    function isHtml($string)
    {
        if ( $string != strip_tags($string) )
        {
            return true; // Contains HTML
        }
        return false; // Does not contain HTML
    }

Here I use a PHP function strip_tags to remove any HTML from the string. It then compares the strings and if they do not match HTML tags were present.

17

The accepted answer will consider a string containing <something> as HTML which, obviously, it is not.

I use the following, which may or may not be a better idea. (Comments appreciated.)

function isHTML( $str ) { return preg_match( "/\/[a-z]*>/i", $str ) != 0; }

This looks for any string containing /> with zero or more letters between the slash and closing bracket.

The above function returns:

<something>             is NOT HTML
<b>foo</b>              is HTML
<B>foo</B>              is HTML
<b>foo<b>               is NOT HTML
<input />               is HTML
Kevin Traas
  • 415
  • 4
  • 5
10

probably the easiest way would be something like:

<?php

function hasTags( $str )
{
    return !(strcmp( $str, strip_tags($str ) ) == 0);
}

$str1 = '<p>something with <a href="/some/url">html</a> in.';
$str2 = 'a string.';

var_dump( hasTags( $str1 ) ); // true - has tags.
var_dump( hasTags( $str2 ) ); // false - no tags.
Ian Wood
  • 6,515
  • 5
  • 34
  • 73
  • 1
    This causes false flags. Take the string for example `I <3 PHP`. This function would determine that tags do exist, when they don't. – buggedcom Aug 19 '13 at 11:00
  • Hmmm, but will scrcmp error on utf-8 text, like many string functions do? – Kzqai Mar 18 '14 at 05:21
1

Here's what I came up with

function isHtml($string){
     preg_match("/<\/?\w+((\s+\w+(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/",$string, $matches);
     if(count($matches)==0){
        return FALSE;
      }else{
         return TRUE;
      }
}

You just pass a string and check if it returns true or false. As simple as that.

-2

That depends on what you define to be html contents.

The most straightforward thing is to test if the string contains the html tag which can be done with the regex

<html.*>

In php the test will be

if (preg_match('/<html.*>/', $subject)) {
    # Successful match
} else {
    # Match attempt failed
}

If you want to see you have valid html it's better to use a html parser.

buckley
  • 13,690
  • 3
  • 53
  • 61