1

I want to remove all the text before and including */ in a string.

For example, consider:

string = ''' something
other things
etc. */ extra text. 
'''

Here I want extra text. as the output.

I tried:

string = re.sub("^(.*)(?=*/)", "", string)

I also tried:

string = re.sub(re.compile(r"^.\*/", re.DOTALL), "", string)

But when I print string, it did not perform the operation I wanted and the whole string is printing.

Akaisteph7
  • 5,034
  • 2
  • 20
  • 43
NewCoder
  • 39
  • 3
  • 1
    does this need to be done with regex? – depperm Aug 05 '19 at 18:44
  • @depperm not necessarily. But I am using regex to remove lines in between two chars, lines start with a particular char or string. So I thought I can accomplish the question above with regex. – NewCoder Aug 05 '19 at 18:46
  • 2
    the problem is that `.` ignores newlines. try this pattern: `(\n|.)+\*/`. Also what happened to the space before `"extra text"`? – pault Aug 05 '19 at 18:49
  • @pault worked perfectly. – NewCoder Aug 05 '19 at 18:51
  • 1
    @NewCoder and if you have multiple `*/` occurrences in text? – RomanPerekhrest Aug 05 '19 at 18:51
  • I have a feeling that you're trying to parse C++ code with python. [Are you writing your own compiler?](https://stackoverflow.com/questions/1444961/is-there-a-good-python-library-that-can-parse-c). – pault Aug 05 '19 at 19:16
  • `string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)` works –  Aug 05 '19 at 20:03

5 Answers5

1

The problem with your first regex is that . does not match newlines as you noticed. With your second one, you were closer but forgot the * that time. This would work:

string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)

You can also just get the part of the string that comes after your "*/":

string = re.search(r"(\*/)(.*)", string, re.DOTALL).group(2)
Akaisteph7
  • 5,034
  • 2
  • 20
  • 43
1

I suppose you're fine without regular expressions:

string[string.index("*/ ")+3:]

And if you want to strip that newline:

string[string.index("*/ ")+3:].rstrip()
ipaleka
  • 3,745
  • 2
  • 13
  • 33
1

Update: After doing some research, I found that the pattern (\n|.) to match everything including newlines is inefficient. I've updated the answer to use [\s\S] instead as shown on the answer I linked.


The problem is that . in python regex matches everything except newlines. For a regex solution, you can do the following:

import re

strng = ''' something
other things
etc. */ extra text. 
'''

print(re.sub("[\s\S]+\*/", "", strng))
# extra text.

Add in a .strip() if you want to remove that remaining leading whitespace.

pault
  • 41,343
  • 15
  • 107
  • 149
0

to keep text until that symbol you can do:

split_str = string.split(' ')
boundary = split_str.index('*/')
new = ' '.join(split_str[0:boundary])
print(new)

which gives you:

 something
other things
etc.
Anna Nevison
  • 2,709
  • 6
  • 21
0
string_list = string.split('*/')[1:]
string = '*/'.join(string_list)
print(string)

gives output as

' extra text. \n'
Gourav Bansal
  • 207
  • 3
  • 5