Regex to identify lines in a text with just numbers or single characters

Question

Folks,

I have a use-case which I know can be solved by traditional string methods in Python. I am looking for more regex'ing way to solve it.

Use-case:

Given a text from a file, I want to remove all such lines which contains either

Only single numbers (may or may not be in parenthesis) such as 29, [29], (29), {29}

Only single character (may or may not be in parenthesis) such as m, [m], (m), {m}

Only just empty lines

Python way (I know of):

Strip out the whitespace if any from the ends

Strip out the parenthesis (if any)

For number: Check if the string is a digit using str.isdigit()

For character, just check length of this string equal to 1

Example:

hello world...
again hello world...

29 

..
[a]
bye bye...
see you..

Expected Output:

hello world...
again hello world...
..
bye bye...
see you..

I want to understand how to perform these steps using a single regex (if possible).

Thanks!

If you mark it as close, please care to comment. – Saurabh Gokhale Nov 29 '18 at 13:39 — Saurabh Gokhale, Nov 29 '18 at 13:39

Jan · Answer 1 · 2018-11-29T14:47:27.517

You could use

^[({\[]?(?:\d+|[a-z])?[)}\]]?\s*$[\n\r]

Which would be replaced by an empty string, see a demo on regex101.com.
When starting to learn regular expressions, turn the "verbose" mode on as often as possible.

In this case

^         # the start of a line in multiline mode (m flag)
[({\[]?   # a character class ([...]) of (,{ or [ zero or 1 times
(?:       # opening of a non-capturing class
    \d+   # multiple digits
|         # or
    [a-z] # a,b,c,...z
)?        # zero or 1 times
[)}\]]?   # one of ), } or ], zero or 1 times
\s*       # whitespaces, eventually
$         # end of the line
[\n\r]    # newline characters

For more information, see Learning regular expressions or Mastering Regular Expressions.

Regex to identify lines in a text with just numbers or single characters

1 Answers1