0

I have a use case that requires the identification of many different pieces of text between any two characters.

For example,

  1. String between a single space and (: def test() would return test
  2. String between a word and space (paste), and a special character (/): @paste "game_01/01" would return "game_01
  3. String between a single space and ( with multiple target strings: } def test2() { Hello(x, 1) would return test2 and Hello

To do this, I'm attempting to write something generic that will identify the shortest string between any two characters.

My current approach is (from chrisz):

pattern = '{0}(.*?){1}'.format(re.escape(separator_1), re.escape(separator_2))

And for the first use case, separator_1 = \s and separator_2 = (. This isn't working so evidently I am missing something but am not sure what.

tl;dr How can I write a generic regex to parse the shortest string between any two characters?

  • Note: I know there are many examples of this but they seem quite specific and I'm looking for a general solution if possible.
Black
  • 4,483
  • 8
  • 38
  • 55
  • What does _"This isn't working"_ mean exactly? Is there an error message? Does it not produce the output you expected? – Aran-Fey Feb 16 '18 at 03:18
  • this is not a free coding service, a free tutorial service or a free design service. – Mad Physicist Feb 16 '18 at 03:20
  • @Aran-Fey it doesn't produce what I'm hoping for. For instance, a space will be include before the target string, sometimes characters (e.g. ` test`) characters after the string will be included (e.g. `test()`) – Black Feb 16 '18 at 03:22
  • Do you have to use regex? Why not write a simple function? – FatihAkici Feb 16 '18 at 05:24
  • @FatihAkici actually I don't have to but thought that this would be simpler, more generic and also curious to see what the solution would be – Black Feb 16 '18 at 05:30

1 Answers1

2

Let me know if this is what you are looking for:

import re

def smallest_between_two(a, b, text):
    return min(re.findall(re.escape(a)+"(.*?)"+re.escape(b),text), key=len)

print(smallest_between_two(' ', '(', 'def test()'))
print(smallest_between_two('[', ']', '[this one][not this one]'))
print(smallest_between_two('paste ', '/', '@paste "game_01/01"'))

Output:

test
this one
"game_01

To add an explanation to what this does:

re.findall():

Return all non-overlapping matches of pattern in string, as a list of strings

re.escape()

Escape all the characters in pattern except ASCII letters and numbers. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it

(.*?)

.*? matches any character (except for line terminators)

*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)

So our regular expression matches any character (not including line terminators) between two arbitrary escaped strings, and then returns the shortest length string from the list that re.findall() returns.

user3483203
  • 50,081
  • 9
  • 65
  • 94
  • This doesn't seem to work when there's multiple expressions on the same line but I believe I can work around this. It seems like `re.escape` was the thing that I was missing. Thanks for your help. – Black Feb 16 '18 at 03:43
  • What do you mean, when there are multiple expressions on the same line? – user3483203 Feb 16 '18 at 03:44
  • For instance, `def test() hello()` will only get `test` and not `hello`. Possibly due to the `min` over `len` – Black Feb 16 '18 at 03:46
  • Your question is: Shortest string between two characters. Hello is not the shortest string between `' '` and `(`. If you remove the `min` you will get all strings that match the criteria. – user3483203 Feb 16 '18 at 03:47
  • Use this: `return re.findall(re.escape(a)+"(.*?)"+re.escape(b),text)` if you want all matches. – user3483203 Feb 16 '18 at 03:49
  • That's what I've currently made the change too. Here is an exact example, `'def test() { Hello(0)`, this will give `['test', '{ Hello']` – Black Feb 16 '18 at 03:53
  • ignore the `]` at the end of the result in the comment above – Black Feb 16 '18 at 04:01
  • I'm confused as to what the issue is. Do you want to also match matches inside of other matches? – user3483203 Feb 16 '18 at 04:02
  • My aim for any string is to find all substrings that are between two characters. So for the example, `def test() { Hello(0)` I would want `['test', 'Hello']` if the two separators are a single whitespace and `(`. Sorry I noticed that the `shortest` requirement was confusing so I've corrected this – Black Feb 16 '18 at 04:14
  • So you don't want to include characters that are not alphanumeric? Because `{ Hello` is a string that occurs between `' '` and `'('` – user3483203 Feb 16 '18 at 04:20
  • Unfortunately not. Here's a similar example, `} def test2() { Hello(x, 1)` this will produce `} def test2', '{ Hello`. So it's quite tricky. – Black Feb 16 '18 at 05:03