2

How can I replace words that have written 'or something' in the parenthesis?

Example:

string = "acısı içine (or yüreğine) çökmek (or işlemek)"

Expected Output:

'acısı içine çökmek' , 'acısı yüreğine çökmek' , 'acısı içine işlemek' , 'acısı yüreğine çökmek'

I am trying to write something like below but it does not work if more than one parenthesis exist.

import re
word='abat etmek (or eylemek)'
item1=re.sub("\([^)]+\)","",word)
parenthesis=re.search('\(([^)]+)', word).group(1)
par=parenthesis.split('or')
item2=item1.replace(item1.split()[-1],par[1])
Gülnur K.
  • 51
  • 3
  • you can get some regex help here: https://stackoverflow.com/questions/4736/learning-regular-expressions?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – Grant Williams May 01 '18 at 14:24
  • 3
    Welcome to `StackOverflow`. While there are people willing to help, you should at least show some efforts on your own. What was the code you were using, where did you get stuck? Imo, this includes a two step approach: getting the words in question with e.g. `import re; rx = re.compile(r'\(or\s+([^()]+)\)\s*'); words = [m.group(1) for m in rx.finditer(string)]` and a set of rules what to do afterwards. Lastly (most people here do not speak Turkish), it is more likely to get an answer with English sentences. – Jan May 01 '18 at 14:35
  • 1
    Being more specific about the problem made this a better question. – doctorlove May 01 '18 at 15:30

1 Answers1

1
import re
import itertools

str = "acısı içine (or yüreğine) çökmek (or işlemek)"
#  spitting the string into "combinable" parts
pattern = re.compile('\w+ \(or \w+\)|\w+')
parts = pattern.findall(str)
#  parts = ['acısı', 'içine (or yüreğine)', 'çökmek (or işlemek)']

#  replacing each part with a list of possible options (one or two)
parts = [_.strip(')').split(' (or ') for _ in parts]
#  parts = [['acısı'], ['içine', 'yüreğine'], ['çökmek', 'işlemek']]

#  producing all possible combinations
result = [' '.join(p) for p in itertools.product(*parts)]
#  result = ['acısı içine çökmek', 'acısı içine işlemek', 'acısı yüreğine çökmek', 'acısı yüreğine işlemek']
nutic
  • 457
  • 2
  • 12
  • Thanks for the solution but I think regex does not work because the list named parts returns like [[''], [''], [''], [''], [''], [''], ['']] – Gülnur K. May 01 '18 at 17:26
  • Worked for me on docker image created from python: latest. What python version on what system do you use? Maybe has something to do with the encoding? Can you try with the string of purely Latina letters (like 'a b (or c) d (or e) f')? – nutic May 01 '18 at 17:53
  • I have use python3 on pycharm and I tested code with different strings. Regex also not works on [link](https://pythex.org/). I didn't understand what is the missing part. – Gülnur K. May 01 '18 at 18:08
  • Ah! Lost brackets escape while copying. Fixed the answer above. – nutic May 01 '18 at 18:16
  • Yes, now it works fine. Lots of thanks for your help. – Gülnur K. May 01 '18 at 19:17