-2

Folks,

I have a use-case which I know can be solved by traditional string methods in Python. I am looking for more regex'ing way to solve it.

Use-case:

Given a text from a file, I want to remove all such lines which contains either

  • Only single numbers (may or may not be in parenthesis) such as 29, [29], (29), {29}
  • Only single character (may or may not be in parenthesis) such as m, [m], (m), {m}
  • Only just empty lines

Python way (I know of):

  • Strip out the whitespace if any from the ends
  • Strip out the parenthesis (if any)
  • For number: Check if the string is a digit using str.isdigit()
  • For character, just check length of this string equal to 1

Example:

hello world...
again hello world...

29 

..
[a]
bye bye...
see you..

Expected Output:

hello world...
again hello world...
..
bye bye...
see you..

I want to understand how to perform these steps using a single regex (if possible).

Thanks!

Saurabh Gokhale
  • 53,625
  • 36
  • 139
  • 164

1 Answers1

1

You could use

^[({\[]?(?:\d+|[a-z])?[)}\]]?\s*$[\n\r]

Which would be replaced by an empty string, see a demo on regex101.com.
When starting to learn regular expressions, turn the "verbose" mode on as often as possible.


In this case
^         # the start of a line in multiline mode (m flag)
[({\[]?   # a character class ([...]) of (,{ or [ zero or 1 times
(?:       # opening of a non-capturing class
    \d+   # multiple digits
|         # or
    [a-z] # a,b,c,...z
)?        # zero or 1 times
[)}\]]?   # one of ), } or ], zero or 1 times
\s*       # whitespaces, eventually
$         # end of the line
[\n\r]    # newline characters

For more information, see Learning regular expressions or Mastering Regular Expressions.

Jan
  • 42,290
  • 8
  • 54
  • 79