-2

I'm learning regular expressions for the first time and ran into the following problem that I'm having trouble solving.

Consider the following paragraph

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec eget eros
libero. Duis ac diam pretium velit volutpat facilisis in vel nibh. In lacinia ; neque 
massa, in consectetur lectus ; faucibus vel. Maecenas ; dapibus leo nec ; elit sagittis 
convallis. Sed at lacus consectetur, eleifend urna tristique, consequat orci. Nullam 
ac orci quis elit pellentesque consectetur quis ac libero. Duis lorem sem, sodales ; ut 
massa sed, porta facilisis ex. Aliquam cursus accumsan ante sed maximus. 

Now I'd like to eliminate all the text that's enclosed by the semi-colon character. The only problem is that the text can span multiple lines AND if a period is reached before a matching semi-colon that string should be retained. For example, the output of paragraph above should be the following:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec eget eros
libero. Duis ac diam pretium velit volutpat facilisis in vel nibh. In lacinia faucibus 
vel. Maecenas elit sagittis convallis. Sed at lacus consectetur, eleifend urna tristique, 
consequat orci. Nullam ac orci quis elit pellentesque consectetur quis ac libero. Duis 
lorem sem, sodales ; ut massa sed, porta facilisis ex. Aliquam cursus accumsan ante sed 
maximus. 

After googling around a bit I found re.MULTILINE mode, but I don't think that's what I need. Any help would be appreciated.

Ockham
  • 455
  • 1
  • 6
  • 16
  • Why downvote? I don't understand. I don't believe this is a duplicate problem – Ockham Feb 03 '16 at 08:39
  • Not sure why you were downvoted. Don't use regex - it's simple enough to write a method that iterates the text and removes anything between semicolons. – Nir Alfasi Feb 03 '16 at 08:40
  • 1
    You need to read [Learning Regular Expressions](http://stackoverflow.com/a/2759417/3832970) and [How to ask](http://stackoverflow.com/help/how-to-ask). The downvote is probably because you showed no efforts of yours to solve the issue. Also, see [Should “Give me a regex that does X” questions be closed?](http://meta.stackoverflow.com/questions/285733/should-give-me-a-regex-that-does-x-questions-be-closed/285739#285739) – Wiktor Stribiżew Feb 03 '16 at 08:41
  • @alfasin I think it would be pretty simple to write I program to do this, but I was just trying to learn more about regular expression. – Ockham Feb 03 '16 at 08:45
  • @WiktorStribiżew I didn't realize my questions was not formatted properly before posting, but thanks for letting me know for the future. – Ockham Feb 03 '16 at 08:46
  • He never mentioned anything about formatting. Read his comment again! – Nir Alfasi Feb 03 '16 at 09:26

1 Answers1

1
;[^;.]*;

You can simply use this and replace by empty string.See demo.

https://regex101.com/r/yX8zV8/3

import re
p = re.compile(r';[^;.]*;')
test_str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec eget eros\nlibero. Duis ac diam pretium velit volutpat facilisis in vel nibh. In lacinia ; neque \nmassa, in consectetur lectus ; faucibus vel. Maecenas ; dapibus leo nec ; elit sagittis \nconvallis. Sed at lacus consectetur, eleifend urna tristique, consequat orci. Nullam \nac orci quis elit pellentesque consectetur quis ac libero. Duis lorem sem, sodales ; ut \nmassa sed, porta facilisis ex. Aliquam cursus accumsan ante sed maximus. "
subst = ""

result = re.sub(p, subst, test_str)
vks
  • 67,027
  • 10
  • 91
  • 124
  • 1
    Does this answer help OP learn more about regex? When spoon-feeding, you must explain every detail of your pattern. Also, explain why OP does not really need `re.M`. – Wiktor Stribiżew Feb 03 '16 at 08:47
  • @WiktorStribiżew i will once he tries and comes up with some questions....if i explain everything beforehand...dat would be spoon feeding – vks Feb 03 '16 at 08:48
  • Providing an answer for a question showing no effort *is* spoon-feeding. – Wiktor Stribiżew Feb 03 '16 at 08:49
  • 1
    I'll dissect this and once I understand I'll accept. Thanks for your help. No spoon-feeding is required, I just needed an example to work through – Ockham Feb 03 '16 at 08:49
  • No problem! we all learn by example sometime, but I will look at the how to ask a question properly next time – Ockham Feb 03 '16 at 08:51
  • 1
    I believe I understand! So you 1.) match a colon 2.) match any character that's NOT a semi-color or a period for 0 to inf times 3) and finally match the closing semi-colon. I believe that's correct, but correct me if I'm wrong. Just out of curiosity, would this problem become a lot more difficult if words like hot;dog (ie hyphenated with semicolon) were allowed. For example, I don't think this pattern would match ("Lorem ipsum dolor sit ; consectur adipiscing elit hot;dog elit donec ; eros.") Thanks again! – Ockham Feb 03 '16 at 09:09
  • @Ockham that's correct....for the second use case use `;(?:[^;.]|(?<=\w);(?=\w))*;` see demo https://regex101.com/r/yX8zV8/4 – vks Feb 03 '16 at 09:13
  • 1
    Ok this one might take me longer to dissect and it's really late where I live and I need to sleep haha Thanks again for your help! – Ockham Feb 03 '16 at 09:14