Remove lines containing numbers attached to letters with Python

Question

I have a txt file containing one sentence per line, and there are lines containing numbers attached to letters. For instance:

The boy3 was strolling on the beach while four seagulls appeared flying.
There were 3 women sunbathing as well.
All children were playing happily.

I would like remove lines like the first one (i.e. having numbers stuck to words) but not lines like the second which are properly written.

Has anybody got a slight idea?

You can start by separating the string into words using the `split` method. Then you can use a loop to check if the `word` is a number by using `isdigit()` method, if is a number, then you can ignore it, if not you will need to check if the word has any numbers by entering to a second loop — EnriqueBet, May 29 '22 at 10:31
There is probably a smart way to check this by using regex, but you might need to dig deeper into that — EnriqueBet, May 29 '22 at 10:32
I have made this regex `[A-ZÁÉÍÓÚÜÑa-záéíóúúñ][0-9]+|[0-9]+[A-ZÁÉÍÓÚÜÑa-záéíóúúñ]` but it kind of ugly haha. Note it must be useful for Spanish, this is why I introduce accents and ñ. — Javier Saldaña Martínez, May 29 '22 at 10:39
This might help: https://stackoverflow.com/questions/6314614/match-any-unicode-letter Please post an answer once you find a solution you're happy with. — Joooeey, May 29 '22 at 10:44
Hello @Joooeey. I will finally be using the one I posted. It works although it is ugly!! — Javier Saldaña Martínez, May 29 '22 at 10:58
Do you at least have the code to read a text file and split into lines, or remove lines? Please provide a [example] and do some research here, e.g. [`[python] words containing numbers`](https://stackoverflow.com/search?q=%5Bpython%5D%20words%20containing%20numbers) — hc_dev, May 29 '22 at 11:43

score 1 · Answer 1 · answered May 29 '22 at 10:45

You can use a simple regex pattern. We start with [0-9]+. This pattern detects any number 0-9 an indefinite amounts of times. Meaning 6, or 56, or 56790 works. If you want to detect sentences that have numbers attached to a string you could use something like this: ([a-zA-Z][0-9]+)|([0-9]+[a-zA-Z]) This regex string matches a string with a letter before a number or after a number. You can search strings using:

import re

lines = [
    'The boy3 was strolling on the beach while 4 seagulls appeared flying.',
    'There were 3 women sunbathing as well.',
]

for line in lines:
    res = re.search("([a-zA-Z][0-9]+)|([0-9]+[a-zA-Z])", line)
    if res is None:
        # remove line

However you can add more characters to the allowed letters if your sentences can include special characters and such.

He want to remove string that matches so condition should be `if res is not None` or just `if re.search("([a-zA-Z][0-9]+)|([0-9]+[a-zA-Z])", line):` — azro, May 29 '22 at 11:35
Also could be this `re.search(r"([a-z]\d)|(\d[a-z])", line, flags=re.IGNORECASE)` — azro, May 29 '22 at 11:35

score 0 · Answer 2 · answered May 29 '22 at 15:31

Suppose, your input text is stored in file in.txt, you can use following code:

import re

with open("in.txt", "r") as f:
    for line in f:
        if not(re.search(r'(?!\d)[\w]\d|\d(?!\d)[\w]', line, flags=re.UNICODE)):
               print(line, end="")

The pattern (?!\d)[\w] looks for word characters (\w) excluding digits. The idea is stolen from https://stackoverflow.com/a/12349464/2740367

Remove lines containing numbers attached to letters with Python

2 Answers2