How to remove text before a particular character or string in multi-line text?

Question

I want to remove all the text before and including */ in a string.

For example, consider:

string = ''' something
other things
etc. */ extra text. 
'''

Here I want extra text. as the output.

I tried:

string = re.sub("^(.*)(?=*/)", "", string)

I also tried:

string = re.sub(re.compile(r"^.\*/", re.DOTALL), "", string)

But when I print string, it did not perform the operation I wanted and the whole string is printing.

@depperm not necessarily. But I am using regex to remove lines in between two chars, lines start with a particular char or string. So I thought I can accomplish the question above with regex. — NewCoder, Aug 05 '19 at 18:46
the problem is that `.` ignores newlines. try this pattern: `(\n|.)+\*/`. Also what happened to the space before `"extra text"`? — pault, Aug 05 '19 at 18:49
@NewCoder and if you have multiple `*/` occurrences in text? — RomanPerekhrest, Aug 05 '19 at 18:51
I have a feeling that you're trying to parse C++ code with python. [Are you writing your own compiler?](https://stackoverflow.com/questions/1444961/is-there-a-good-python-library-that-can-parse-c). — pault, Aug 05 '19 at 19:16
`string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)` works — , Aug 05 '19 at 20:03

Akaisteph7 · Answer 1 · 2019-08-05T19:16:56.137

1

The problem with your first regex is that . does not match newlines as you noticed. With your second one, you were closer but forgot the * that time. This would work:

string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)

You can also just get the part of the string that comes after your "*/":

string = re.search(r"(\*/)(.*)", string, re.DOTALL).group(2)

edited Aug 05 '19 at 19:16

answered Aug 05 '19 at 18:46

Akaisteph7

5,034
2
20
43

score 1 · Answer 2 · answered Aug 05 '19 at 18:46

1

I suppose you're fine without regular expressions:

string[string.index("*/ ")+3:]

And if you want to strip that newline:

string[string.index("*/ ")+3:].rstrip()

answered Aug 05 '19 at 18:46

ipaleka

3,745
2
13
33

pault · Answer 3 · 2019-08-05T19:35:29.170

Update: After doing some research, I found that the pattern (\n|.) to match everything including newlines is inefficient. I've updated the answer to use [\s\S] instead as shown on the answer I linked.

The problem is that . in python regex matches everything except newlines. For a regex solution, you can do the following:

import re

strng = ''' something
other things
etc. */ extra text. 
'''

print(re.sub("[\s\S]+\*/", "", strng))
# extra text.

Add in a .strip() if you want to remove that remaining leading whitespace.

score 0 · Answer 4 · answered Aug 05 '19 at 18:49

0

to keep text until that symbol you can do:

split_str = string.split(' ')
boundary = split_str.index('*/')
new = ' '.join(split_str[0:boundary])
print(new)

which gives you:

 something
other things
etc.

answered Aug 05 '19 at 18:49

Anna Nevison

2,709
6
21

score 0 · Answer 5 · answered Aug 05 '19 at 19:10

0

string_list = string.split('*/')[1:]
string = '*/'.join(string_list)
print(string)

gives output as

' extra text. \n'

answered Aug 05 '19 at 19:10

Gourav Bansal

207
3
5

How to remove text before a particular character or string in multi-line text?

5 Answers5