0

I'm new to Python and I want to remove and replace the ({ / / }) with space, the sample below

The original sentence:

NULL ({ / / }) Regina ({ 4 p1 p2 / / }) Shueller ({ 5 p1 p2 / / }) works ({ / / }) for ({ / / }) Italy ({ 14 / / }) 's ({ 15 / / }) La ({ 16 / / }) Repubblica ({ 17 / / }) newspaper ({ 18 / / }) . ({ 38 / / })

Transform to this:

Regina Shueller works for Italy 's La Repubblica newspaper.

I've tried this code but that was not what I expected

Sentence = re.sub(r'[({ / / })]',' ', sentence)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
iqra sadra
  • 13
  • 3
  • The best I came up with is [`r'\s*(?:\(\{[^/]*/\s*/\s*}\)|NULL)\s*'`](https://regex101.com/r/nJ4yY8/1) (to be replaced with space). But the space between the last word and the `.` cannot be removed like this. And the value must be trimmed from spaces. – Wiktor Stribiżew Jan 21 '16 at 16:24
  • Your transformed string does not match what you say you want – Padraic Cunningham Jan 21 '16 at 16:34
  • Try [like this](https://regex101.com/r/wF4nS6/2) with [Python regex module](https://pypi.python.org/pypi/regex) (pattern uses backreference `(?1)`). Or with `re` [this pattern](https://regex101.com/r/tC1sJ0/1): `\({[^}]*}\)|NULL|\s+(?!\w)` and trim leading space. – bobble bubble Jan 21 '16 at 17:08
  • Thank you so much @WiktorStribiżew for your answer, that regex works well. – iqra sadra Jan 22 '16 at 09:21
  • Thanks @bobblebubble – iqra sadra Jan 22 '16 at 09:21

4 Answers4

1

The pattern you tried: r'[({ / / })]' means:

Match any single character that is one of (, {, , /, }, or )

The key to this is understanding the regular expression language. Each of those characters has a special meaning in that language.

A pattern such as r' \({ [^/]*/ / }\) ' would match each of the different sections in your example.

dsh
  • 12,037
  • 3
  • 33
  • 51
0

You can go with this:

r'(\([^(]*\))'

With live demo

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
0

If the format is always the same you could try keeping alpha's after stripping punctuation:

from string import punctuation
print(" ".join([w for w in s.split() if w.strip(punctuation).isalpha()]))

Or using a regex:

print(re.sub(r'\({.*?}\)',"",s))

You are removing everything that has ({}) regardless of what is inside in your expected output.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • The lazy dot matching regex [may play a bad joke on you](https://regex101.com/r/dQ4zT0/1). Do not use lazy dot matching where you do not have to. – Wiktor Stribiżew Jan 21 '16 at 16:37
  • @WiktorStribiżew, I do need it, I meant to remove the / / from the pattern as it is not what the OP i looking to match based on their expected output. What is inside is irrelevant – Padraic Cunningham Jan 21 '16 at 16:43
0

You can use

r'\s*(?:\(\{[^/]*/\s*/\s*}\)|NULL)\s*'

See regex demo

Regex explanation:

  • \s* - zero or more whitespaces
  • (?:\(\{[^/]*/\s*/\s*}\)|NULL) - two alternatives, NULL or \(\{[^/]*/\s*/\s*}\) matching...
    • \( - opening round bracket
    • \{ - opening brace
    • [^/]* - zero or more characters other than /
    • / - a literal /
    • \s* - zero or more whitespaces
    • /\s* - ibid.
    • } - a closing brace
    • \) - a closing round bracket
  • \s* - zero or more whitespaces

Note that the spaces in between words and punctuation should be handled separately.

Python demo:

import re
p = r'\s*(?:\(\{[^/]*/\s*/\s*}\)|NULL)\s*'
test_str = "NULL ({ / / }) Regina ({ 4 p1 p2 / / }) Shueller ({ 5 p1 p2 / / }) works ({ / / }) for ({ / / }) Italy ({ 14 / / }) 's ({ 15 / / }) La ({ 16 / / }) Repubblica ({ 17 / / }) newspaper ({ 18 / / }) . ({ 38 / / })"
result = re.sub(p, " ", test_str)
print(result.strip())
# => Regina Shueller works for Italy 's La Repubblica newspaper .
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563