-2

I have a bunch of data that looks like this:

Bigtable,[4] MariaDB[5]

How do I use Python re library to remove those [4] quotations?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214

2 Answers2

0

You can use the re.sub to remove those scientific quotations

>>> import re
>>> s = "Bigtable,[4] MariaDB[5]"
>>> re.sub(r'\[.*?\]', '', s)
'Bigtable, MariaDB'

The regex \[.*?\] will match the substrings that starts with [ and ends with ] with as few character inside the brackets as possible

If you only want to remove square brackets with numbers inside, use this regex instead: \[\d+\]

martin
  • 63
  • 4
0

Solution

It is slightly unclear as to what you want in your output: only drop [4] types or drop all quotes, such as... [4], [5].

The following two examples show you how to handle these two scenarios with python. However, if you want to use command line, here is what you can using echo + sed.

echo "Bigtable,[4] MariaDB[5]" | sed -E "s/\[[0-9]+\]//gm"

## Output
Bigtable, MariaDB

Example-1: Python

Assuming that you only want to replace citation quotes that are similar to [4] and still keep [5], this should work for you.

See example in Regex101.com: scenario-1

import re

# define regex pattern(s)
pattern1 = r"(?:,\s*)(\[\d+\])(?:(\s*)?)"

# compile regex pattern(s) for speed
pat1 = re.compile(pattern1)

# evaluate regex substitution
result1 = pat1.sub(r',\g<2>', text)

print(result1)

With ,\g<2> we are replacing ,[4] with , but leaving [5] untouched. See here for more details.

Output:

Bigtable, MariaDB[5]
Bigtable, MariaDataBase[15]
Bigtable, GloriaDB[51]

Example-2: Python

Removing all such [4], [5] quotations.

See example in Regex101.com: scenario-2

import re

# define regex pattern(s)
pattern2 = r"(\[\d+\])"

# compile regex pattern(s) for speed
pat2 = re.compile(pattern2)

# evaluate regex substitution
result2 = pat2.sub('', text)

print(result2)

Output:

Bigtable, MariaDB
Bigtable, MariaDataBase
Bigtable,  GloriaDB

Dummy Data

# we will use this text for testing the regex
text = """\
Bigtable,[4] MariaDB[5]
Bigtable,[40] MariaDataBase[15]
Bigtable, [14] GloriaDB[51]
"""
CypherX
  • 7,019
  • 3
  • 25
  • 37