1
import requests
import xml.etree.ElementTree as ET
import re

gen_news_list=[]
r_milligenel = requests.get('http://www.milliyet.com.tr/D/rss/rss/Rss_4.xml')
root_milligenel = ET.fromstring(r_milligenel.text)

for entry in root_milligenel:
    for channel in entry:
        for item in channel:
            title = re.search(".*title.*",item.tag)
            if title:
                gen_news_list.append(item.text)
            link = re.search(".*link.*",item.tag)
            if link:
                gen_news_list.append(item.text)
                r = requests.get(item.text)
                print(r.text)

I have a list which named gen_news_list and i'm trying to append titles, summaries, links etc to this list. But there is an error occur when i tried to request a link:

  Traceback (most recent call last):
  File "/home/deniz/Masaüstü/Çalışmalar/Python/Bot/xmlcek.py", line 23, in <module>
    r = requests.get(item.text)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 456, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 553, in send
    adapter = self.get_adapter(url=request.url)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 608, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '
http://www.milliyet.com.tr/tbmm-baskani-cicek-programlarini/siyaset/detay/2037301/default.htm

The first link worked successfully. But second one out an error. I can't add the content to list cause of this error. Is it a problem about my loop? What's wrong with code?

mehardxx
  • 25
  • 1
  • 1
  • 4
  • What is the content of `item.text` just before the line `r = requests.get(item.text)`? – halex Apr 01 '15 at 09:10
  • Could you print the `repr` version of the URL that results in the error? I looked at the other questions that produce the same error but this one appears to me to be caused by the URL starting with a newline. – Dan D. Apr 01 '15 at 09:10
  • item.text is content of XML tag. At the code that is the link. Link which i want to request ("http://www.milliyet.com.tr"). First link worked well. – mehardxx Apr 01 '15 at 09:18

1 Answers1

5

If you add the line print(repr(item.text)) before the problematic line r = requests.get(item.text) you see that starting the second time item.text has \n at the beginning and the end which is not allowed for a URL.

'\nhttp://www.milliyet.com.tr/tbmm-baskani-cicek-programlarini/siyaset/detay/2037301/default.htm\n'

I use repr because it literally shows the newline as the string \n in its output.

The solution to your problem is to call strip on item.text to remove those newlines:

r = requests.get(item.text.strip())
halex
  • 16,253
  • 5
  • 58
  • 67