-3

I'm trying to figure out how to represent the following regex in python:

Find the first occurence of {any character that isn't a letter}'{unlimited amount of any character including '}'{any character that isn't a letter}

For example:

She said 'Hello There!'.
`he Looked. 'I've been sick' and then...`

My question is how do I implement the middle part? How do I represent an unlimited amount of characters until the pattern in the end is found (`_)?

reo
  • 13
  • 3
  • Your second example doesn't match the pattern you've given. – melpomene Nov 11 '18 at 11:58
  • How so? it shouldn't detect the ' unless there's a non-letter afterwards, so the I've wouldn't end the pattern – reo Nov 11 '18 at 12:00
  • 1
    Hi Tom, welcome to stackoverflow. Did you have a look at the [python re module documentation](https://docs.python.org/3/library/re.html)? Do you need an unlimited amount or an unlimited amount bigger than zero? Do you want to match as many or as few characters as possible? – MisterMiyagi Nov 11 '18 at 12:02
  • Yes I have, However I can't figure out how to implement it because of cases like the second one. How do I make it match patterns like ` 'I've been sick' and then...` There's a ' in the middle of the pattern that shouldn't prevent it from detecting the correct pattern. – reo Nov 11 '18 at 12:08
  • Your pattern starts by matching a non-letter followed by `'`. In your second example there are three `'`, but none of them has a non-letter in front of it. – melpomene Nov 11 '18 at 12:09
  • The last example would have a whitespace before it's first ': ` 'I've been sick' she said` I think my pattern, to begin with, wasn't correct, It shouldn't disallow ' in between the two main ' s. I don't think this is a duplicate of this issue as it presents a few other issues other than just matching any character but a specific one. – reo Nov 11 '18 at 12:27
  • I mean, the code is just `[^a-zA-Z]'.*?'[^a-zA-Z]`. This is not a very interesting question. – melpomene Nov 11 '18 at 12:29

1 Answers1

0

There are a few different ways you can represent an indefinite number of characters:

  • *: zero or more of the preceding character (greedy)
  • +: one or more of the preceding character (greedy)
  • *?: zero or more of the preceding character (non-greedy)
  • +?: one or more of the preceding character (non-greedy)

"Greedy" means that as many characters as possible will be matched. "Non-greedy" means that as few characters as possible will be matched. (For more explanation on greedy and non-greedy, see this answer.)

In your case, it sounds like you want to match one or more characters, and for the match to be non-greedy, so you need +?.

In Python code:

import re
my_regex = re.compile(r"\W'[^']+?'\W")
my_regex.search("She said 'Hello There!'.")

This regex won't match your second example, 'I've been sick' and then..., as there is no non-word character before the first '.

Jack Taylor
  • 5,588
  • 19
  • 35
  • Why do you think it's one or more characters? "Unlimited amount" sounds like it would include 0. – melpomene Nov 11 '18 at 12:11
  • Your regex doesn't match `4'-'4`. – melpomene Nov 11 '18 at 12:12
  • @melpomene I was guessing that it was one or more characters based on the examples in the question. It doesn't sound like Tom wants to match empty quotes. Not sure what you're getting at with `4'-'4`. Why should the regex match that? – Jack Taylor Nov 11 '18 at 12:21
  • Because it's a non-letter, followed by `'`, followed by something that's not `'`, followed by `'`, followed by a non-letter. – melpomene Nov 11 '18 at 12:22
  • The last example would have a whitespace before it's first ': ` 'I've been sick' she said` I think my pattern to begin with wasn't correct, It shouldn't disallow ' in between the two main ' s. – reo Nov 11 '18 at 12:24
  • @melpomene Well, I suppose you could use `[^\w\d]` instead of `\W` if you really want to catch digits before the first quote mark. Whether @Tom wants to do that or not is up to him. Tom, if you want to know exactly which characters `\W` matches, you can check in the [re module documentation](https://docs.python.org/3/library/re.html). – Jack Taylor Nov 11 '18 at 12:31
  • `[^\w\d]` is equivalent to `[^\w]` is equivalent to `\W`. – melpomene Nov 11 '18 at 12:33
  • @Tom In that case it sounds like you want a greedy match rather than a non-greedy match. Be aware, though, that using greedy matches will mean matching multiple quotes if there are multiple quotes in your string. i.e., for `She said 'foo' and 'bar'.` a greedy match would match ` 'foo' and 'bar'.`. – Jack Taylor Nov 11 '18 at 12:34
  • @melpomene Hm, yes, you're right about that - my bad. `\w` includes digits, so `[\w\d]` is the same as `\w`, and `\W` is its inverse. If you want to specifically match digits, then, it becomes difficult, as you have to account for all the letter characters that aren't in Ascii (for example `á`), and you can't capture those with a simple `[a-zA-Z]`. – Jack Taylor Nov 11 '18 at 12:39