-2

How do I do this in python?

badphrases.txt contains

Go away
Don't do that
Stop it

allphrases.txt contains

I don't know why you do that. Go away.
I was wondering what you were doing.
You seem nice

I want allphrases.txt to be clean of the lines in badphrases.txt.

It's trivial in bash

cat badfiles.txt | while read b
do
cat allphrases.txt | grep -v "$b" > tmp
cat tmp > allphrases.txt
done

Oh, you thought I hadn't looked or tried. I searched for over and hour.

Here's my code:

# Files  
ttv = "/tmp/tv.dat"  
tmp = "/tmp/tempfile"  
bad = "/tmp/badshows"  

badfiles already exists
...code right here creates ttv

# Function grep_v  
def grep_v(f,str):  
     file = open(f, "r")   
     for line in file:  
          if line in str:  
               return True  
     return False  

t = open(tmp, 'w')  
tfile = open(ttv, "r")   
for line in tfile:  
     if not grep_v(bad,line):  
          t.write(line)  
tfile.close  
t.close  
os.rename(tmp, ttv)  
W. Hunk
  • 9
  • 1

2 Answers2

0

First google how to read a file in python:

you will probably get something like this: How do I read a file line-by-line into a list?

Use this to read both the files in lists

with open('badphrases.txt') as f:
    content = f.readlines()
badphrases = [x.strip() for x in content] 

with open('allphrases.txt') as f:
    content = f.readlines()
allphrases = [x.strip() for x in content] 

Now you have both the content in lists.

Iterate over allphrases and check if phrases from badphrases are present in it.

At this point you might consider google :

  • how to iterate over a list python
  • how to check if string present in another string python

Take the code from those places and built a brute-force algo like this:

for line in allphrases:
    flag = True
    for badphrase in badphrases:
        if badphrase in line:
            flag = False
            break
    if flag:
        print(line)

If you can understand this code then you will notice you need to replace print with output to file:

  • Now google how to print to file python.

Then think about how to improve the algorithm. All the best.

UPDATE:

@COLDSPEED suggested you can simple google - how to replace lines in a file in python:

You might get something like this: Search and replace a line in a file in Python

Which also works.

Vikash Singh
  • 13,213
  • 8
  • 40
  • 70
  • Might as well just ask the user to google "how to replace lines in a file in python". – cs95 Aug 08 '17 at 05:01
  • There might be a 100 ways to do this. Clearly s/he is trying to learn python. so giving some basic tips. – Vikash Singh Aug 08 '17 at 05:02
  • Sometimes things are just way easier in another language. Whatever the solution in python is, it's bound to be much more complicated than it needs to be. I don't get why python is popular. – W. Hunk Aug 08 '17 at 06:11
  • @W.Hunk What if the file badfiles.txt was actually available over API? Your problem is very simple hence shell script can solve it way better and faster. Python can write a very complicated web-application can you do that in shell ? (Maybe but not as secure and easily in python). – Vikash Singh Aug 08 '17 at 06:18
  • Not app problems are to be solved with python/C/ C++/ Shell/ Perl/Java. If you know the basics of each of these languages then based on the situation you can easily chose which one to use. and you will have a good reason why. Hope that helps. – Vikash Singh Aug 08 '17 at 06:20
0

Solution not too bad.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import feedparser, os, re

# Files
h = os.environ['HOME']
ttv = h + "/WEB/Shows/tv.dat"
old = h + "/WEB/Shows/old.dat"
moo = h + "/WEB/Shows/moo.dat"
tmp = h + "/WEB/Shows/tempfile"
bad = h + "/WEB/Shows/badshows"

# Function not_present
def not_present(f,str):
     file = open(f, "r") 
     for line in file:
          if str in line:
               return False
     return True

# Sources (shortened)
sources = ['http://predb.me/?cats=tv&rss=1']

# Grab all the feeds and put them into ttv and old
k = open(old, 'a')
f = open(ttv, 'a')
for h in sources:
     d = feedparser.parse(h)
     for post in d.entries:
          if not_present(old,post.link):
               f.write(post.title + "|" +  post.link + "\n")
               k.write(post.title + "|" +  post.link + "\n")
f.close
k.close

# Remove shows without [Ss][0-9] and put them in moo
m = open(moo, 'a')
t = open(tmp, 'w')
file = open(ttv, "r") 
for line in file:
     if re.search(r's[0-9]', line, re.I) is None:
          m.write(line)
#          print("moo", line)
     else:
          t.write(line)
#          print("tmp", line)
t.close
m.close
os.rename(tmp, ttv)

# Remove badshows
t = open(tmp, 'w')
with open(bad) as f:
    content = f.readlines()
bap = [x.strip() for x in content] 

with open(ttv) as f:
    content = f.readlines()
all = [x.strip() for x in content] 

for line in all:
    flag = True
    for b in bap:
        if b in line:
            flag = False
            break
    if flag:
         t.write(line + "\n")
t.close
os.rename(tmp, ttv)
W. Hunk
  • 9
  • 1
  • See python is not so bad after all. Also you can improve `# Function not_present`. Currently it is reading file each time. Read the file once and store in list. When the method is called check from that list. – Vikash Singh Aug 08 '17 at 07:44