Preg_match, if formatted correctly

Question

if(preg_match("/(.*(&lt;code&gt;).*(&lt;\/code&gt;).*)*/", $string))

I was trying many hours already, but I can't make it work. I want that if user formatted text correctly like:

(any_string*<code>any_string*</code>any_string*)*

Only then I would format text with * meaning empty string or many times. What's wrong with my expression?

edit: I want to match lalala text <code>dlalala lala code</code> lalalal. If it's lala <code> lalala or lalal </code> <code> lalala <code>alala then we don't want to match it.

Your regex isn't incorrect at all. However, are you trying to match multiple-line content between the tags? Then remember the dot matches anything BUT newlines. — Adonais, Feb 26 '12 at 04:17
Can you post some examples of the string that you want to match? — Broncha, Feb 26 '12 at 04:18
Does any_string* has to be the same in front, middle and after the code tags? — Broncha, Feb 26 '12 at 04:28
I think the reason it is always true is that the entire thing is contained within a subpattern with an asterisk after it, which means the entire pattern can match zero times and still be considered a match. — Rick, Feb 26 '12 at 04:31

Toto · Accepted Answer · 2012-03-08T18:28:12.583

This works with the 2 given test cases:

$arr = array('lalala text <code>dlalala lala code</code> lalalal.',
             'lala <code> lalala or lalal </code> <code> lalala <code>alala');
foreach ($arr as $str) {
    echo "$str\n";
    if (preg_match('#^(?<!<code>).*<code>.*?</code>(?!.*<code>)#', $str)) {
        echo "===> Match\n";
    } else {
        echo "===> Not match\n";
    }
}

output:

lalala text <code>dlalala lala code</code> lalalal.
===> Match
lala <code> lalala or lalal </code> <code> lalala <code>alala
===> Not match

Some explanation about the regex:

#           : regex delimiter
^           : begining of string
  (?<!      : start negative lookbehind
    <code>  : literally <code>
  )         : end of lookbehind
  .*        : any char any number of time
  <code>    : literally <code>
  .*?       : any char any number of time not greedy
  </code>   : literally </code>
  (?!       : start negative lookahead
    .*      : any char any number of time
    <code>  : literally <code>
  )         : end of lookahead
#           : regex delimiter

You can find some usefull informations about lookaround here

It seems to work. Can you describe how it works just a little bit? — good_evening, Mar 08 '12 at 17:53

score 4 · Answer 2 · edited May 23 '17 at 12:31

You probably could use greed killer ? in your code expression (more info here: Matching text between delimiters: greedy or lazy regular expression? ), so if you have a code like this:

<code>foo</code> another <code>bar</code>

It will match just foo and bar not foo</code> another <code>bar, also you should use preg_match_all() (with flag PREG_OFFSET_CAPTURE) and write your own parser. Or rather use preg_replace_callback() like this:

// Just strtolower example (this would do formatting)
function myCallback( $matches){
    return strlower( $matches[2]);
}

$string = preg_replace_callback("/(&lt;code&gt;).*?(&lt;\/code&gt;)/si", 'myCallback', $string)

Note the question-mark in .*?. You also should use s and i modifiers so your code would work on codes like this:

lorem ipsum <code>
foo
</code> bar

If you need validation you can use this:

$string = preg_replace("/(&lt;code&gt;).*?(&lt;\/code&gt;)/si", '', $string);
if( (strpos( $string, '<code') !== false) || (strpos( $string, '</code') !== false){
    echo 'Invalid code';
}

score 3 · Answer 3 · answered Mar 06 '12 at 22:01

3

<?php
$string = "aaa<code>asd</code>aaaasd";
if (preg_match("#[a-zA-Z ]+<code>[a-zA-Z ]+<\/code>[a-zA-Z ]+#", $string))
{
echo "It's a match!\n";
} else {
echo "No match, sorry.\n";
}

answered Mar 06 '12 at 22:01

Marian Zburlea

9,177
4
31
37

score 2 · Answer 4 · answered Mar 06 '12 at 19:00

2

    $string = "aaa<code>dlalala lala code</code>aaa";
    if (preg_match("#.*<code>.*<\/code>.*#", $string)) {
            echo "OK\n";
    } else {
            echo "NOK\n";
    }

answered Mar 06 '12 at 19:00

dAm2K

9,923
5
44
47

webfan · Answer 5 · 2012-03-07T21:18:25.967

2

try these

$string = "lalala text <code>dlalala lala code</code> lalalal";

if(strlen($string)>0){
    preg_match("/\<code\>(.*)\<\/code\>/",$string, $code);
    echo $code[1];
}else{
    echo "no code found";
}

output will be:

dlalala lala code

Good luck :)

edited Mar 07 '12 at 21:18

answered Mar 07 '12 at 21:11

webfan

25
3

score 2 · Answer 6 · edited May 23 '17 at 11:45

2

<?php
$sample_text = <<<EOF
blah blah
<code>one</code>
foo<code>two</code>three</code>
<code><code>four</code>bar
</code><code>five</code>foobar
<code>six</code>
blah blah blah
EOF;

preg_match_all('/<code>(?\'code\'((?!<\/?code>).)*)<\/code>/', $sample_text, $codes);

print_r($codes); 
?>

i believe is what you're looking for. i referred to here and tested the regex here.

edited May 23 '17 at 11:45

Community

1
1

answered Mar 10 '12 at 15:41

ZagNut

1,431
15
20

fred2 · Answer 7 · 2012-02-26T16:17:17.670

0

You'd be better off using strpos for simple string comparisons like this, which means you don't have to worry about escaping special characters, and it's faster.

This will work

EDITED TO GET POSTION OF TAGS.

$string = "This string has 'Anything <code> anything </code> anything' in it in the right order.";
$start = strpos($string, '<code>');
$end = strpos($string, '</code>');
if ($start !== FALSE && $end !== FALSE && $end > $start){
     echo $string;
}else{
    echo 'incorrectly formatted';
}

IF YOU WANT TO USE PREG_MATCH

if(preg_match("/.*(<code>).*(<\/code>).*/", $string)){
       echo $string
  }

Note - you don't want to use HTML entities unless you are sure the string is formatted using HTML entities. You don't need the outer set of parentheses.

edited Feb 26 '12 at 16:17

answered Feb 26 '12 at 04:24

fred2

1,015
2
9
29

If this answer does not do what you want, can you tell me why and in what way? – fred2 Mar 06 '12 at 04:48
You may have reasons for preferring preg_match, but your comment takes me back to strpos. Strpos makes sure that there isn't a in front of a because it stops checking after the first match. Otherwise I'd refer you to the first answer at http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags and suggest an xml parser. In other words - regex sucks at validating HTML and is best avoided. – fred2 Mar 09 '12 at 22:10
Final answer - try http://querypath.org/ I've never used it, but it looks a much better way to handle HTML input and get rid of the garbage. – fred2 Mar 09 '12 at 22:29

Preg_match, if formatted correctly

7 Answers7