4

I am trying to process a multi-lines string, replace and remove some lines. here is the code.

>>> txt
'1 Introduction\nPart I: Applied Math and Machine Learning Basics\n2 Linear Algebra'
>>> tmp = []
>>> for line in txt.splitlines():
...     if re.findall('[0-9]', line):
...         replaced = re.sub('[0-9]', '#', line)
...         tmp.append(replaced)
>>> print(tmp)
['# Introduction', '# Linear Algebra']

this piece of code has done my job though, I am not sure if it is the most efficient way.

i tried this post and doc, it seems that none of their multiple find is for multi-lines.

is there a more efficient way to do this?

  • There is nothing wrong with your code. You can compress it into a single line using a list comprehension: `[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.findall('[0-9]', line)]` if it looks more readable. – Selcuk Apr 01 '19 at 05:50

1 Answers1

0

You can use list comprehension for the code you provided in the question, this makes the code neat.

[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.findall('[0-9]', line) ]

# Output 
['# Introduction', '# Linear Algebra']

Also, like @CertainPerformance have mentioned in the comments, as you just want to know whether a number is present in the string it is better to use search instead of findall. Then you can re-write the list comprehension code as,

[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.search('[0-9]', line) ]

# Output 
['# Introduction', '# Linear Algebra']

I can see a small performance improvement while using search in my machine.

%%timeit 1000000

[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.search('[0-9]', line) ]

# 4.76 µs ± 53.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit 1000000

[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.findall('[0-9]', line) ]

# 5.21 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Sreeram TP
  • 11,346
  • 7
  • 54
  • 108