-1

I'm filtering and cleaning some data such as this one(東急オリジナルお礼品(18)). I tried the following to filter out the digits and the parenthesis. Case 1 categoryData = re.sub(r"\(.*?\)", "", category_element.find("span").get_text())

Case 2 categoryData = re.sub(r"/\([0-9]+\)/", "", category_element.find("span").get_text())

Still it is not working

My goal is tho have this data only(東急オリジナルお礼品)

Ariel Anasco
  • 55
  • 2
  • 11

1 Answers1

1
import re

match = r"\([0-9]+\)"
string = "東急オリジナルお礼品(18)"
self.categoryData = re.sub(match, '', string)

Result:

東急オリジナルお礼品
Diego Miguel
  • 531
  • 4
  • 13
  • I would go more in this direction with pattern: `match = r"([^\w\s]+|\d+)"` - `\w` is alpha-numeric, compliant with any alphabet - so just remove digits and you have everything that is neither letter in any alphabet, nor space (you might add punctuation - OP wasn't clear about that). – Grzegorz Skibinski Mar 18 '21 at 16:13