0

I have strings containing dash - characters, I want to extract the portion of the string before the first dash character is encountered except in the case where the dash is in single/double quotes.

ie.

Theory 'Gabe B - Tailor' Jacket - nordstrom.com I want to extract Theory 'Gabe B - Tailor' Jacket

Theory "Gabe B - Tailor" Jacket - nordstrom.com I want to extract Theory "Gabe B - Tailor" Jacket

Tailor Jacket - Jackets - nordstrom.com I want to extract Tailor Jacket

What regex can I use with preg_match to achieve the result?

Commonboy
  • 41
  • 2
  • 6
  • something like `(.*?) - nordstrom.com` – Kin Jan 30 '13 at 21:09
  • 2
    Wouldn't it be easier to extract the portion of the string before the *last* dash? – Jon Jan 30 '13 at 21:11
  • * See also [Open source RegexBuddy alternatives](http://stackoverflow.com/questions/89718/is-there) and [Online regex testing](http://stackoverflow.com/questions/32282/regex-testing) for some helpful tools, or [RegExp.info](http://regular-expressions.info/) for a nicer tutorial. – mario Jan 30 '13 at 21:14
  • Googling tip: You didn't find anything due to your expressionless question [title](http://meta.stackexchange.com/questions/10647/how-do-i-write-a-good-title/112966#112966). If you entered "regex match text between quotes", you would have; dozens of results. – mario Jan 30 '13 at 21:17

3 Answers3

0

You could use an expression like this to handle single and double quoting (without escapes):

(?:[^-]+|"[^"]*"|'[^']*')+

Or just capture everything till the last -:

(.+)-
Qtax
  • 33,241
  • 9
  • 83
  • 121
0

How about a non-regex alternative?

$input = "'Gabe B - Tailor' Jacket - nordstrom.com";

$insideQuotes = false;
for ($i=0 ; $i<strlen($input) ; $i++) {

    if (!$insideQuotes && $input[$i] == "-") {
        break;
    }

    if ($input[$i] == "'" || $input[$i] == '"') {
        $insideQuotes = !$insideQuotes;
        continue;
    }
}

echo substr($input, 0, $i);
Tchoupi
  • 14,560
  • 5
  • 37
  • 71
0

I believe this regexp is what you're looking for -

([^-"']|"[^"]*"|'[^']*')*?(?=\s*\-)