3

I have some text like so:

1.6 # blah blah blah
# fjsadfklj slkjf yes 3.4
1.8*
1.9 1.10 #blah
#blah
1.11

I want to clean it up by removing all # characters plus anything following them on the same line. In other words, I desire:

1.6
1.8*
1.9 1.10
1.11

What is the best way to approach this? Via simple methods like partition, or maybe regexes?

norman
  • 5,128
  • 13
  • 44
  • 75
  • Possible duplicate of http://stackoverflow.com/questions/1706198/python-how-to-ignore-comment-lines-when-reading-in-a-file Note that the best answer is not the top-rated, probably look at http://stackoverflow.com/a/27178714/2284490 for the most robust answer – Cireo May 02 '17 at 02:31

2 Answers2

3

You may try this,

re.sub(r'\s*#.*', '', s)

\s* will helps to match also the preceding vertical or horizontal space character. What I mean by vertical space is newline character , carriage return.

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
2

Maybe this does what you want it to do in fulfilling your request?

example = '''1.6 # blah blah blah
# fjsadfklj slkjf yes 3.4
1.8*
1.9 1.10 #blah
#blah
1.11'''

for line in example.splitlines():
    print(line.split('#', 1)[0])

If you really want the comment text, the code is easily modifiable to allows its capture as well.

Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117
  • This is the superior method because it is simple and explicit. – Josh J Jul 31 '15 at 17:51
  • A naive `timeit` shows split is also ~4x as fast. `python -m timeit 'strs = ("x"*(100 - i%101) + "#" + "y"*100 for i in xrange(10000)); import re' 'for s in strs: re.sub(r"\s*#.*", "", s)'` vs `s.split("#", 1)[0]`. 31.5 msec vs 7.02 msec on my machine – Cireo May 02 '17 at 02:28