2

I'm modifying PHP Markdown (a PHP parser of the markup language which is used here on Stack Overflow) trying to implement points 1, 2 and 3 described by Jeff in this blog post. I've easily done the last two, but this one is proving very difficult:

  1. Removed support for intra-word emphasis like_this_example

In fact, in the "normal" markdown implementation like_this_example would be rendered as likethisexample. This is very undesirable; I want only _example_ to become example.

I looked in the source code and found the regex used to do the emphasis:

var $em_relist = array(
    ''  => '(?:(?<!\*)\*(?!\*)|(?<!_)_(?!_))(?=\S|$)(?![.,:;]\s)',
    '*' => '(?<=\S|^)(?<!\*)\*(?!\*)',
    '_' => '(?<=\S|^)(?<!_)_(?!_)',
    );
var $strong_relist = array(
    ''   => '(?:(?<!\*)\*\*(?!\*)|(?<!_)__(?!_))(?=\S|$)(?![.,:;]\s)',
    '**' => '(?<=\S|^)(?<!\*)\*\*(?!\*)',
    '__' => '(?<=\S|^)(?<!_)__(?!_)',
    );
var $em_strong_relist = array(
    ''    => '(?:(?<!\*)\*\*\*(?!\*)|(?<!_)___(?!_))(?=\S|$)(?![.,:;]\s)',
    '***' => '(?<=\S|^)(?<!\*)\*\*\*(?!\*)',
    '___' => '(?<=\S|^)(?<!_)___(?!_)',
    );

I tried to open it in Regex Buddy but it wasn't enough, and after spending half an hour working on it I still don't know where to start. Any suggestions?

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Community
  • 1
  • 1
Andreas Bonini
  • 44,018
  • 30
  • 122
  • 156
  • These regexes confuse me so I'm not too sure, but I think you can use `\b` (word boundary) to check that the underscores are at the beginning and the end of a word. For instance, `\bfoo\b` will match `foo` in "foo bar" but not in "foobar" nor "barfoo". Just for reference (I'm quite sure you can't use it as is) `/\b_(.+?)_\b/` does a pretty good job on the inputs I've tested. – zneak Aug 01 '10 at 01:06
  • hey this is not an answer but you guys are the experts.. can you help me this beginner level problem of WMD http://stackoverflow.com/questions/3616788/wmd-how-to-get-the-generated-markdown-html-code – Moon Sep 01 '10 at 10:01

2 Answers2

3

I use RegexBuddy too. :)

You may want to try the following code:

<?php

$line1 = "like_this_example";
$line2 = "I want only _example_ to become example";
$pattern = '/\b_(?P<word>.*?)_\b/si';

if (preg_match($pattern, $line1, $matches))
{
  $result = $matches['word'];
  var_dump($result);
}

if (preg_match($pattern, $line2, $matches))
{
  $result = $matches['word'];
  var_dump($result);
}

?>
Box
  • 2,432
  • 1
  • 18
  • 20
2

I was able to grab only individual _enclosed_ words via:

$input = 'test of _this_ vs stuff_like_this...and here is _anothermatch_ and_another_fake_string';
$pattern = '#(?<=\s|^)(?<!_)(_[^_]*_)(?!_)#is';
preg_match_all($pattern, $input, $matches);
print_r($matches);

I'm not sure how exactly that would fit into the above code though. You would probably need to pair it with the other patterns below to account for the two and three match situations:

$pattern = '#(?<=\s|^)(?<!_)(__[^_]*__)(?!_)#is';
$pattern = '#(?<=\s|^)(?<!_)(___[^_]*___)(?!_)#is';
Jeffrey Blake
  • 9,659
  • 6
  • 43
  • 65