Remove unmixed numbers from file

Question

Say I have a file called input.txt that looks like this

I listened to 4 u2 albums today
meet me at 5
squad 4ever

I want to filter out the numbers that are on their own, so "4" and "5" should go but "u2" and "4ever" should remain the same. i.e the output should be

I listened to u2 albums today
meet me at
squad 4ever

I've been trying to use this code

for line in fileinput.input("input.txt", inplace=True):
    new_s = ""
    for word in line.split(' '):
        if not all(char.isdigit() for char in word):
            new_s += word
            new_s += ' '
    print(new_s, end='')

Which is pretty similar to the code I found here: Removing numbers mixed with letters from string

But instead of the wanted output I get

I listened to u2 albums today
 meet me at 5
 squad 4ever

As you can see there are two problems here, first only the first line loses the digit I want it to lose, "5" is still present in the second line. The second problem is the extra white space at the beginning of a new line.

I've been playing around with the code for a while and browsing stackoverflow, but can't find where the problem is coming from. Any insights?

The problem is that the last word on the line ends with `\n`. This is not a digit, so it passes the if statement, and the extra space is because you add a space each time in the for loop, including for the last word on the line. — yinnonsanders, Nov 15 '17 at 14:58

score 3 · Accepted Answer · answered Nov 15 '17 at 14:59

str.split(' ') does not remove the trailing newlines from each line. They end up attached to the last word of the line. So for your first problem, the '5' doesn't get removed because it's actually '5\n', and the \n is not a digit.

The second problem is related. When you print the last word of each line, it contains that newline, plus you're adding a space on to the end. That space shows up as the first character of the next line.

The simplest solution is simply to change line.split(' ') to line.split(). Without any arguments, split() will remove all whitespace, including the newlines. You'll also need to remove the end='' from your print so that the newlines are added back in.

There's also an extra space added at the end of each line (before the new line) that should be dealt with, possibly by using `print(new_s[:-1])` — yinnonsanders, Nov 15 '17 at 15:03
@yinnonsanders Or by storing the words for each line in a list and doing a `' '.join()`. — glibdud, Nov 15 '17 at 15:03

fievel · Answer 2 · 2017-11-15T15:00:10.447

1

Just use regexp.

re.sub(r"\b\d+\b", "", input)

match any digit between word boundaries

Or to avoid double spaces:

re.sub(r"\s\d+\s", " ", input)

edited Nov 15 '17 at 15:00

answered Nov 15 '17 at 14:53

fievel

480
3
9

That kind of works, but it leaves a white space instead of nothing when replacing a number, which turns "I listened to 4 u2 albums today" into "I listened to u2 albums today", with 2 spaces between "to" and "u2". Any way to fix this? – Skum Nov 15 '17 at 14:57
Edited with a solution – fievel Nov 15 '17 at 15:00

score 0 · Answer 3 · answered Nov 15 '17 at 14:53

0

You can use regex:

data = open('file.txt').read()
import re
data = re.sub('(?<=\s)\d+(?=$)|(?<=^)\d+(?<=\s)|(\s\d+\s)', '', data)

Output:

I listened tou2 albums today
meet me at
squad 4ever

answered Nov 15 '17 at 14:53

Ajax1234

69,937
8
61
102

Remove unmixed numbers from file

3 Answers3