2

I know how to delete extra-word numbers in Python, with:

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)

I'm wondering whether it would be possible to perform the same action while keeping dates:

s = "I want to delete numbers like 84 but not dates like 2015"

In English a quick and dirty rule could be: if the number starts with 18, 19, or 20 and has length 4, don't delete.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Antoine
  • 1,649
  • 4
  • 23
  • 50

1 Answers1

3

To match any digit sequences other than 4-digit sequences starting with 18/19/20, you can use

r'\b(?!(?:18|19|20)\d{2}\b)\d+\b'

See regex demo

The regex matches:

  • \b - leading word boundary
  • (?!(?:18|19|20)\d{2}\b) - a negative lookahead that restricts the subsequent pattern \d+ to only match when the no 18, 19 or 20 are in the beginning and then followed by exactly two digits \d{2} (note you can shorten the lookahead to (?!(?:1[89]|20)\d{2}\b) but a lot of people usually frown upon that as readability suffers)
  • \d+ - 1 or more digits
  • \b - trailing word boundary

Python code:

p = re.compile(r'\b(?!(?:18|19|20)\d{2}\b)\d+\b')
test_str = "Stack Overflow is a privately held website, the flagship site of the Stack Exchange Network, 4 5 6 created in 2008"
print p.sub("", test_str)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Just thought about float values, and came up with [`\b(?!(?:18|19|20)\d{2}\b(?!\.\d))\d*\.?\d+\b`](https://regex101.com/r/tB0qR8/2). – Wiktor Stribiżew Dec 10 '15 at 10:55
  • thanks for the quick reply but with `Stack Overflow is a privately held website, the flagship site of the Stack Exchange Network, 4 5 6 created in 2008` it only remove the first number (4), not 5 and 6... see: https://regex101.com/r/tB0qR8/1 – Antoine Dec 10 '15 at 11:00
  • with respect to your first comment, thanks a lot for this improvement. – Antoine Dec 10 '15 at 11:02
  • [It works](https://regex101.com/r/tB0qR8/3) if you add `/g` flag. In Python, `re.sub` replaces all matches (no need to specify anything) by default. – Wiktor Stribiżew Dec 10 '15 at 11:04