-1

I'm trying to remove every word that starts with a certain string in a text file. I'm stuck on how to write to the output file.

Input file:

Lorem ipsum applePEAR
dolor appleBANANA sit 
appleORANGE amet, consectetur

Desired output file:

Lorem ipsum 
dolor sit
amet, consectetur

My approach so far:

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split()
        for word in ls():
            if word.startswith("apple"):
                line.replace(word, "")
        fout.write(line)

I think the problem with this approach is replacing words in the line split list, not the line itself.

Checking Stackoverflow, I see this problem is similar to: using Python for deleting a specific line in a file, except the "nickname_to_delete" is a word that starts with a string.

3 Answers3

1

I've updated your code as little as I could:

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split(" ")
        newline = []
        for word in ls:  # Don't call() the list
            if not word.startswith("apple"):
                newline.append(word)  # Append all words that don't start with apple.
        fout.write(" ".join(newline))  # Remake new line

Keep in mind a regex replacement would be better and could take care of "newword,appleshake":

import re

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        fout.write(re.sub(r"\bapple\w+", "", line))

Punctuation will still suffer with \w but you need to choose how to deal with it.

Bharel
  • 23,672
  • 5
  • 40
  • 80
1

There are a few problems.

  • You are calling ls() - should be just ls
  • Calling line.replace() (aside from the typo) does not modify the contents of line - it simply returns a new string, which you are then discarding
  • There is a risk in principle that by doing the replace in this way, you will also delete parts of other words unintentionally - in the line "I like pineapples and apples", the "apples" in "pineapples" would also get deleted ("I like pine and ").

Here is an alternative (note limitation: the amount of whitespace between words is not preserved).

with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        ls = line.split()
        words = [word for word in ls if not word.startswith('apple')]
        line_out = ' '.join(words)
        fout.write(line_out + '\n')
alani
  • 12,573
  • 2
  • 13
  • 23
0

Filter can be also used

word="apple" 
with open(infile) as fin, open(outfile, "w+") as fout:
    for line in fin:
        string_iterable = filter(lambda x:not(x.startswith(word)), line.strip().split())
        fout.write(" ".join(string_iterable))
Vishesh Mangla
  • 664
  • 9
  • 20