I'm having trouble finding the regex that matches the start and end chars of a php class, which are { and } respectively. The regex should also not match the { and } if they are inside php comments, in other words it should not match if the { or } is preceded by any char but whitespace.
I suppose I should use negative look behind, but I'm a little rusty on regex, and so far I didn't found the solution.
Here is my test string:
<?php
namespace Ling\Light_TaskScheduler\Service;
/**
* The LightTaskSchedulerService class. :{
*/
class LightTaskSchedulerService
{
/**
*
* This method IS the task manager.
* See the @page(Light_TaskScheduler conception notes) for more details.
*
*/
public function run()
{
$executionMode = $this->options['executionMode'] ?? "lastOnly";
$this->logDebug("Executing run method with execution mode \"$executionMode\".");
}
}
// this can happen in comments: }, why
// more stuff
And my pattern, which doesn't work at the moment, is this:
if(preg_match('!^\s*\{\s*(.*)(?<![^\s]*)\}!ms', $c, $match)){
a($match);
}
So, I used multiline modifier "m", since we need to parse a multiline string, then I used the "s" modifier so that the dot matches line breaks, but then the negative look behind part (?<![^\s]*) doesn't seem to work. I'm basically trying to say don't match the "}" char if it's preceded by anything but a whitespace.
@Wiktor Stribiżew: I tried this pattern but it still doesn't work: !^\s*\{\s*(.*)(?<!\S)\}!ms
Considering Tim Biegeleisen's comment, I'll probably take a simpler approach, like removing the comments first, and then do the simpler regex !^\s*\{\s*(.*)\}!ms
, which I know will work.
However, if somebody knows a regex that does it, I would be interested in seeing it.
Problem solved for now, I'm out, thanks guys.
@Wiktor Stribiżew
The weird thing is that your regex works on the regex101 website, but it doesn't work in my version of php (PHP 7.2.31).
So I mean: this doesn't work in my php world:
$c = <<<'EEE'
<?php
/**
* The LightTaskSchedulerService class. :{
*/
class LightTaskSchedulerService
{
/**
*
* This method IS the task manager.
* See the @page(Light_TaskScheduler conception notes) for more details.
*
*/
public function run()
{
$executionMode = $this->options['executionMode'] ?? "lastOnly";
$this->logDebug("Executing run method with execution mode \"$executionMode\".");
}
}
// this can happen in comments: }, why
// more stuff
EEE;
if(preg_match('/^\s*\{\s*(.*)(?<!\S)\}$/gms', $c, $match)){
echo "a match was found"; // is never displayed
}
exit;
So I don't know what regex101 is using under the hood, but doesn't work for me.
UPDATE
As Tim suggested, regex might not be the most appropriate tool for this job.
I ended up using a very simple solution to find the end character, and something similar can be applied to find the start character:
/**
* Returns an array containing information related to the end of the class.
*
* Important note, this method assumes that:
*
* - the parsed php file contains valid php code
* - the parsed php file contains only one class
*
* If either the above assumptions are not true, then this method won't work properly.
*
*
*
* The returned array has the following structure:
*
*
* - endLine: int, the number of the line containing the class declaration's last char
* - lastLineContent: string, the content of the last line being part of the class declaration
*
*
* @return array
*/
public function getClassLastLineInfo(): array
{
$lastLineNumber = null;
$lastLineContent = null;
$lines = file($this->file);
$reversedLines = array_reverse($lines);
foreach ($reversedLines as $k => $line) {
if ('}' === trim($line)) {
$n = count($lines);
$lastLineNumber = $n - $k;
$lastLineContent = $line;
break;
}
}
return [
"endLine" => $lastLineNumber,
"lastLineContent" => $lastLineContent,
];
}
With something similar for the start char, we basically can obtain the line numbers of the start and end characters of the class, and armed with those, we can simply get all the lines of the string as an array, and use a combination of array_slice/implode to "recompile" the content of the class.
Anyway, thanks for the comments.