Open links from txt file in python

Question

I would like to ask for help with a rss program. What I'm doing is collecting sites which are containing relevant information for my project and than check if they have rss feeds. The links are stored in a txt file(one link on each line). So I have a txt file with full of base urls what are needed to be checked for rss.

I have found this code which would make my job much easier.

import requests  
from bs4 import BeautifulSoup  

def get_rss_feed(website_url):
    if website_url is None:
        print("URL should not be null")
    else:
        source_code = requests.get(website_url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.find_all("link", {"type" : "application/rss+xml"}):
            href = link.get('href')
            print("RSS feed for " + website_url + "is -->" + str(href))

get_rss_feed("http://www.extremetech.com/")

But I would like to open my collected urls from the txt file, rather than typing each, one by one.

So I have tryed to extend the program with this:

from bs4 import BeautifulSoup, SoupStrainer

with open('test.txt','r') as f:
    for link in BeautifulSoup(f.read(), parse_only=SoupStrainer('a')): 
        if link.has_attr('http'): 
            print(link['http'])

But this is returning with an error, saying that beautifoulsoup is not a http client.

I have also extended with this:

def open()
    f = open("file.txt")
    lines = f.readlines()
    return lines

But this gave me a list separated with ","

I would be really thankfull if someone would be able to help me

score 1 · Answer 1 · answered Jun 24 '16 at 21:00

1

Typically you'd do something like this:

with open('links.txt', 'r') as f:
    for line in f:
        get_rss_feed(line)

Also, it's a bad idea to define a function with the name open unless you intend to replace the builtin function open.

answered Jun 24 '16 at 21:00

nemetroid

2,100
13
20

Thank you I give it a try. thanks for the advice with open, i have missed it – Platy Jun 24 '16 at 21:06
I have inserted your suggested code into the program. Now it returns without any error message, but also without results. root@loko:~# sudo python /root/Desktop/rsskeres.py root@loko:~# sudo python /root/Desktop/rsskeres.py if I print out lines from your code i get the url root@loko:~# sudo python /root/Desktop/nyit3.py http://www.theguardian.com/ and this is the return what the original program gives: root@loko:~# sudo python /root/Desktop/rsskeres.py RSS feed for http://www.theguardian.com/is --> http://www.theguardian.com/international/rss What could be the problem? – Platy Jun 24 '16 at 21:51
I imagine you would want `line.rstrip()` – Padraic Cunningham Jun 24 '16 at 23:20

danielarend · Answer 2 · 2016-06-24T21:17:30.527

0

i guess you can make it by using urllib

    import urllib
    f = open('test.txt','r')
    #considering each url in a new line...
    while True:
     URL = f.readline()
     if not URL:
       break
     mycontent=urllib.urlopen(URL).read()

edited Jun 24 '16 at 21:17

answered Jun 24 '16 at 21:00

danielarend

1,379
13
26

thanks for fast help! And where should I define the location of the txt file? – Platy Jun 24 '16 at 21:03
if you have problems on iterate trough text file, chek this: [http://stackoverflow.com/a/5733487/6495164] – danielarend Jun 24 '16 at 21:10

Open links from txt file in python

2 Answers2