1

I am trying to create a regex that allows me to find instances of a string where I have an unspaced / eg:

some characters/morecharacters

I have come up with the expression below which allows me to find word characters or closing parenthesis before my / and word characters or open parenthesis characters afterwards.

(\w|\))/(\(|\w)

This works great for most situations, however I am coming unstuck when I have a / enclosed in quotes. In this case I'd like it to be ignored. I have seen a few different posts here and here. However, I can't quite get them to work in my situation.

What I'd like is for first three cases identified below to match and the last cast to be ignored allowing me to extract item 1 and item 3.

some text/more text
(formula)/dividethis
divideme/(byme)
"dont match/me"
Community
  • 1
  • 1
Dan
  • 2,625
  • 5
  • 27
  • 42

2 Answers2

6

It ain't pretty, but this will do what you want:

(?<!")(?:\(|\b)[^"\n]+\/[^"\n]+(?:\)|\b)(?!")

Demo on Regex101

Let's break it down a bit:

  • (?<!")(?:\(|\b) will match either an open bracket or a word boundary, as long as it's not preceded by a quotation mark. It does this by employing a negative lookbehind.
  • [^"\n]+ will match one or more characters, as long as they're neither a quotation mark or a line break (\n).
  • \/ will match a literal slash character.
  • Finally, (?:\)|\b)(?!") will match either a closing bracket or a word boundary as long as it's not followed by a quotation mark. It does this by employing a negative lookahead. Note that the (?:\)|\b) will only work 100% correctly in this order - if you reverse them, it'll drop the match on the bracket, because it encounters a word boundary before it gets to the bracket.
Sebastian Lenartowicz
  • 4,695
  • 4
  • 28
  • 39
  • That is great and certainly identifies the full string, is it possible to split out the first part (i.e. before `/`) and the second part (i.e after`/`)? – Dan Nov 14 '16 at 04:19
  • Actually figured it out `((?<!\")(?:\(|\b)[^\"\n]+)/([^\"\n]+(?:\)|\b)(?!\"))` This is based on `python` approach to escaping rather than `PHP` – Dan Nov 14 '16 at 04:23
0

This will only match word/word which is not inside quotation marks.

import re

text = """
some text/more text "dont match/me" divideme/(byme)
(formula)/dividethis
divideme/(byme) "dont match/me hel d/b lo a/b" divideme/(byme)
"dont match/me"
"""

groups=re.findall("(?:\".*?\")|(\S+/\S+)", text, flags=re.MULTILINE)
print filter(None,groups)

Output:

['text/more', 'divideme/(byme)', '(formula)/dividethis', 'divideme/(byme)', 'divideme/(byme)']
  • (?:\".*?\") This will match everything inside quotes but this group won't be captured.
  • (\S+/\S+) This will match word/word only outside the quotations and this group will be captured.

Demo on Regex101

Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78