0

match a // in a line of array and replace // with blank space if its not inside " ". For example I have this line:

//this is a test
"this is a // test"

the output should be:


"this is a // test"

it will ignore // inside ""

for now I've come up with this regex:

$string[$i] = preg_replace('#(?<!")//.*(?!")#',' ',$string[$i]);

but this does't work if the // is in the middle or last part of a line.

Jason Orendorff
  • 42,793
  • 6
  • 62
  • 96
Leaf Sy
  • 21
  • 3
  • 4
    Unfortunately your problem is one of a particular sort that is not well-suited to be solved by regex. Check out this tongue-in-cheek answer to a similar question: http://stackoverflow.com/a/1732454/145999 – HugoRune Aug 04 '12 at 23:05
  • 3
    [What is the XY Problem?](http://meta.stackexchange.com/q/66377/164291) –  Aug 04 '12 at 23:22
  • Are you performing this operation on PHP code (i.e., a string that you could parse using [`token_get_all()`](http://php.net/token_get_all))? –  Aug 04 '12 at 23:25
  • 1
    Is it possible to process the text line-by-line, or do you need to be able to account for quotes that go across multiple lines (e.g., `"this is \n a // \n test"`)? –  Aug 04 '12 at 23:28
  • Thanks @HugoRune I spend the whole night trying to solve this. I'll come back to post any updates so that i can contribute to others with the same problem. – Leaf Sy Aug 04 '12 at 23:28
  • @Phoenix We are not allowed to use tokenizer built in for php, Our subject (unfortunately) is Compiler Design, we are tasked to make first a lexical Analyzer or tokenizer. We need not to consider quotes that go across multiple lines. We need to mimic a C compiler. – Leaf Sy Aug 04 '12 at 23:33
  • How about a `.*` before `//`? – Dogbert Aug 04 '12 at 23:37
  • 2
    Then regex should be forbidden anyway. That’s not how a compiler works. – fuxia Aug 04 '12 at 23:37
  • @Dogbert How would you detect `//` inside a string? – Arjan Aug 04 '12 at 23:39
  • @Dogbert I've tried putting .*// before with no luck, it will just proceed with the negative lookahead and will not process the negative lookbehind. My idea is to make a regex that will check if the // are inside a " " or not, to do that, I used negative lookbehind (?!") followed by //, this will check if there's a " before //, and a negative lookahead //.*(?!") that will check if theres " after //, if it satisfies the condition that theres no " before and after // then that will be the time to replace // and all characters after that with blank space. – Leaf Sy Aug 04 '12 at 23:47
  • @toscho - Why disallow regexs? Scanner generators often use regexs. See lex, flex. A compiler class is a great place to learn this. – walrii Aug 05 '12 at 00:04

2 Answers2

1

Since you don't have to worry about quotes that span multiple lines, this makes your job significantly easier.

One approach would be to go through the input line-by-line and explode() each line using " as the delimiter:

$processed = '';

/* Split the input into an array with one (non-empty) line per element.
 *
 * Note that this also allows us to consolidate and normalize line endings
 *  in $processed.
 */
foreach( preg_split("/[\r\n]+/", $input) as $line )
{
  $split = explode('"', $line);

  /* Even-numbered indices inside $split are outside quotes. */
  $count = count($split);
  for( $i = 0; $i < $count; $i += 2 )
  {
    $pos = strpos($split[$i], '//');
    if( $pos !== false )
    {
      /* We have detected '//' outside of quotes.  Discard the rest of the line. */
      if( $i > 0 )
      {
        /* If $i > 0, then we have some quoted text to put back. */
        $processed .= implode('"', array_slice($split, 0, $i)) . '"';
      }

      /* Add all the text in the current token up until the '//'. */
      $processed .= substr($split[$i], 0, $pos);

      /* Go to the next line. */
      $processed .= PHP_EOL;
      continue 2;
    }
  }

  /* If we get to this point, we detected no '//' sequences outside of quotes. */
  $processed .= $line . PHP_EOL;
}
echo $processed;

Using the following test string:

<?php
$input = <<<END
//this is a test
"this is a // test"
"Find me some horsemen!" // said the king, or "king // jester" as I like to call him.
"I am walking toward a bright light." "This is a // test" // "Who are you?"
END;

We get the following output:


"this is a // test"
"Find me some horsemen!" 
"I am walking toward a bright light." "This is a // test" 
1

I don't know about RegEx but you can easily achieve this using substr_replace and strpos:

$look = 'this is a "//" test or not //';
$output = "";
$pos = -1;
while($pos = strpos($look, '//'))
{
    if(strpos($look, '"//') == ($pos - 1)) {
        $output = $output.substr($look, 0, $pos + 4);
        $look = substr($look, $pos + 4);
        continue;
    }
    $output = $output .substr_replace(substr($look, 0, $pos + 2), '', $pos, 2);
    $look = substr($look, $pos + 2);
}

//$output = "this is a // test or not"
Chibueze Opata
  • 9,856
  • 7
  • 42
  • 65