Python regex removing word with regex

Question

I'm new to Python and I want to remove and replace the ({ / / }) with space, the sample below

The original sentence:

NULL ({ / / }) Regina ({ 4 p1 p2 / / }) Shueller ({ 5 p1 p2 / / }) works ({ / / }) for ({ / / }) Italy ({ 14 / / }) 's ({ 15 / / }) La ({ 16 / / }) Repubblica ({ 17 / / }) newspaper ({ 18 / / }) . ({ 38 / / })

Transform to this:

Regina Shueller works for Italy 's La Repubblica newspaper.

I've tried this code but that was not what I expected

Sentence = re.sub(r'[({ / / })]',' ', sentence)

The best I came up with is [`r'\s*(?:$\{[^/]*/\s*/\s*}$|NULL)\s*'`](https://regex101.com/r/nJ4yY8/1) (to be replaced with space). But the space between the last word and the `.` cannot be removed like this. And the value must be trimmed from spaces. — Wiktor Stribiżew, Jan 21 '16 at 16:24
Your transformed string does not match what you say you want — Padraic Cunningham, Jan 21 '16 at 16:34
Try [like this](https://regex101.com/r/wF4nS6/2) with [Python regex module](https://pypi.python.org/pypi/regex) (pattern uses backreference `(?1)`). Or with `re` [this pattern](https://regex101.com/r/tC1sJ0/1): `${[^}]*}$|NULL|\s+(?!\w)` and trim leading space. — bobble bubble, Jan 21 '16 at 17:08
Thank you so much @WiktorStribiżew for your answer, that regex works well. — iqra sadra, Jan 22 '16 at 09:21

score 1 · Answer 1 · answered Jan 21 '16 at 16:39

1

The pattern you tried: r'[({ / / })]' means:

Match any single character that is one of (, {, , /, }, or )

The key to this is understanding the regular expression language. Each of those characters has a special meaning in that language.

A pattern such as r' ${ [^/]*/ / }$ ' would match each of the different sections in your example.

answered Jan 21 '16 at 16:39

dsh

12,037
3
33
51

That's right! I should have to learn the regular expression deeply. Thanks for your response! – iqra sadra Jan 22 '16 at 09:23

score 0 · Answer 2 · answered Jan 21 '16 at 16:23

0

You can go with this:

r'(\([^(]*\))'

With live demo

answered Jan 21 '16 at 16:23

Thomas Ayoub

29,063
15
95
142

1

I think this regex is rather unsafe for this task. – Wiktor Stribiżew Jan 21 '16 at 16:26
2

@WiktorStribiżew well... It fits the need, given the provided input. I've simplified it as far as I could, which might be bad if the input provided doesn't reflect the reality. – Thomas Ayoub Jan 21 '16 at 16:28

Padraic Cunningham · Answer 3 · 2016-01-21T16:38:31.070

0

If the format is always the same you could try keeping alpha's after stripping punctuation:

from string import punctuation
print(" ".join([w for w in s.split() if w.strip(punctuation).isalpha()]))

Or using a regex:

print(re.sub(r'\({.*?}\)',"",s))

You are removing everything that has ({}) regardless of what is inside in your expected output.

edited Jan 21 '16 at 16:38

answered Jan 21 '16 at 16:30

Padraic Cunningham

176,452
29
245
321

The lazy dot matching regex [may play a bad joke on you](https://regex101.com/r/dQ4zT0/1). Do not use lazy dot matching where you do not have to. – Wiktor Stribiżew Jan 21 '16 at 16:37
@WiktorStribiżew, I do need it, I meant to remove the / / from the pattern as it is not what the OP i looking to match based on their expected output. What is inside is irrelevant – Padraic Cunningham Jan 21 '16 at 16:43

Wiktor Stribiżew · Accepted Answer · 2016-05-17T22:36:21.710

You can use

r'\s*(?:\(\{[^/]*/\s*/\s*}\)|NULL)\s*'

See regex demo

Regex explanation:

\s* - zero or more whitespaces
(?:$\{[^/]*/\s*/\s*}$|NULL) - two alternatives, NULL or $\{[^/]*/\s*/\s*}$ matching...
- $ - opening round bracket
- \{ - opening brace
- [^/]* - zero or more characters other than /
- / - a literal /
- \s* - zero or more whitespaces
- /\s* - ibid.
- } - a closing brace
- $ - a closing round bracket
\s* - zero or more whitespaces

Note that the spaces in between words and punctuation should be handled separately.

Python demo:

import re
p = r'\s*(?:\(\{[^/]*/\s*/\s*}\)|NULL)\s*'
test_str = "NULL ({ / / }) Regina ({ 4 p1 p2 / / }) Shueller ({ 5 p1 p2 / / }) works ({ / / }) for ({ / / }) Italy ({ 14 / / }) 's ({ 15 / / }) La ({ 16 / / }) Repubblica ({ 17 / / }) newspaper ({ 18 / / }) . ({ 38 / / })"
result = re.sub(p, " ", test_str)
print(result.strip())
# => Regina Shueller works for Italy 's La Repubblica newspaper .

As a bonus :), try removing the space before non-opening punctuation and symbols with `re.sub(r"\s+([~\`!@#$%^&*)_+=}\]\\|;:.>,-])", r"\1", result.strip())` — Wiktor Stribiżew, Jan 22 '16 at 09:32

Python regex removing word with regex

4 Answers4