0

I am building an API in Flask to get news from different RSS feeds for news sites, and I get most of the results. But, sometimes I randomly get a 500 Internal Server Error status code, and in the console this gets logged:

[2021-08-29 16:49:40,852] ERROR in app: Exception on /world [GET]
Traceback (most recent call last):
  File "/Users/ragz/Library/Python/3.8/lib/python/site-packages/flask/app.py", line 1513, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/ragz/Library/Python/3.8/lib/python/site-packages/flask/app.py", line 1499, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/Users/ragz/Library/Python/3.8/lib/python/site-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/Users/ragz/Library/Python/3.8/lib/python/site-packages/flask/views.py", line 83, in view
    return self.dispatch_request(*args, **kwargs)
  File "/Users/ragz/Library/Python/3.8/lib/python/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/Users/ragz/dev/Python/news_summary/src/python/api.py", line 14, in get
    worldnews = feedparser.parse(random.choice(list(source.world_news)))
  File "/Users/ragz/Library/Python/3.8/lib/python/site-packages/feedparser/api.py", line 216, in parse
    data = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers, request_headers, result)
  File "/Users/ragz/Library/Python/3.8/lib/python/site-packages/feedparser/api.py", line 115, in _open_resource
    return http.get(url_file_stream_or_string, etag, modified, agent, referrer, handlers, request_headers, result)
  File "/Users/ragz/Library/Python/3.8/lib/python/site-packages/feedparser/http.py", line 172, in get
    data = f.read()
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/http/client.py", line 471, in read
    s = self._safe_read(self.length)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/http/client.py", line 614, in _safe_read
    raise IncompleteRead(data, amt-len(data))
http.client.IncompleteRead: IncompleteRead(1606 bytes read, 2306 more expected)

I looked through some other Stack Overflow answers, but I didn't find much that was related... Does anyone know what the error is here?

Here is my code -

import feedparser
from flask import Flask
from flask_restful import Resource, Api, reqparse
import random
import source
import requests

app = Flask(__name__)
api = Api(app)

class WorldNews(Resource):
    # methods go here
    def get(self):
        worldnews = feedparser.parse(random.choice(list(source.world_news)))
        entry = random.choice(list(worldnews.entries))
        title = entry.title # convert dataframe to dictionary
        summary = entry.summary 
        date = entry.published 
        link = entry.link 
        return {'title': title, 'summary': summary, 'date': date, 'link': link, }, 200  # return data and 200 OK code
    pass

api.add_resource(WorldNews, '/world')  # '/users' is our entry point

class TechNews(Resource):
    # methods go here
    def get(self):
        technews = feedparser.parse(random.choice(list(source.tech_sources)))
        entry = random.choice(list(technews.entries))
        title = entry.title # convert dataframe to dictionary
        summary = entry.summary 
        date = entry.published 
        link = entry.link 
        return {'title': title, 'summary': summary, 'date': date, 'link': link, }, 200  # return data and 200 OK code
    pass

api.add_resource(TechNews, '/tech')

class BusinessNews(Resource):
    # methods go here
    def get(self):
        technews = feedparser.parse(random.choice(list(source.business)))
        entry = random.choice(list(technews.entries))
        title = entry.title # convert dataframe to dictionary
        summary = entry.summary 
        date = entry.published 
        link = entry.link 
        return {'title': title, 'summary': summary, 'date': date, 'link': link, }, 200  # return data and 200 OK code
    pass

api.add_resource(BusinessNews, '/business')

class SportsNews(Resource):
    # methods go here
    def get(self):
        technews = feedparser.parse(random.choice(list(source.sports)))
        entry = random.choice(list(technews.entries))
        title = entry.title # convert dataframe to dictionary
        summary = entry.summary 
        date = entry.published 
        link = entry.link 
        return {'title': title, 'summary': summary, 'date': date, 'link': link, }, 200  # return data and 200 OK code
    pass

api.add_resource(SportsNews, '/sports')

class ScienceNews(Resource):
    # methods go here
    def get(self):
        technews = feedparser.parse(random.choice(list(source.science)))
        entry = random.choice(list(technews.entries))
        title = entry.title # convert dataframe to dictionary
        summary = entry.summary 
        date = entry.published 
        link = entry.link 
        return {'title': title, 'summary': summary, 'date': date, 'link': link, }, 200  # return data and 200 OK code
    pass

api.add_resource(ScienceNews, '/science')

class HealthNews(Resource):
    # methods go here
    def get(self):
        technews = feedparser.parse(random.choice(list(source.health)))
        entry = random.choice(list(technews.entries))
        title = entry.title # convert dataframe to dictionary
        summary = entry.summary 
        date = entry.published 
        link = entry.link 
        return {'title': title, 'summary': summary, 'date': date, 'link': link, }, 200  # return data and 200 OK code
    pass

api.add_resource(HealthNews, '/health')

class EntertainmentNews(Resource):
    # methods go here
    def get(self):
        technews = feedparser.parse(random.choice(list(source.entertainment)))
        entry = random.choice(list(technews.entries))
        title = entry.title # convert dataframe to dictionary
        summary = entry.summary 
        date = entry.published 
        link = entry.link 
        return {'title': title, 'summary': summary, 'date': date, 'link': link, }, 200  # return data and 200 OK code
    pass

api.add_resource(EntertainmentNews, '/entertainment')




if __name__ == '__main__':
    app.run()  

I also have a dictionary with different RSS feeds for each category.

bigman1234
  • 105
  • 2
  • 12

1 Answers1

1

I built a webapp with similar functionality to this a while ago and got this error which was due to data dropping in the connection, i.e. feedparser tries to parse the feed but the connection drops mid parsing the data hence the incomplete read error.

It looks like you can just replace it with another source since you are reading random sites, so I would recommend doing something like this and logging the sites that cause this error to see if there any repeat offenders and then remove them from your sources if that's the case.

So:

try:
    randomly_chosen_news_sources = random.choices(tuple(set((source.world_news))), k=2)
    first_one_to_try = randomly_chosen_news_sources[0]
    backup = randomly_chosen_news_sources[1]
    worldnews = feedparser.parse(randomly_chosen_news_source)
except Exception as e:
    print(e)  # do this first to figure out what the error message is that comes up then replace e with appropriate error
    # then log the results
    app.logger.error("Connection error while parsing feed {}".format(randomly_chosen_news_source))
    worldnews = feedparser.parse(backup)

Of course a risk doing this is that your backup choice may lead to the same error. If that's an issue I would extract the try/except logic into it's own method and then apply that every time you sample a source.

Perhaps a better approach is to do this - I also built in a bit more resilience in my app with tenacity.

Something like this should do the trick:

from tenacity import retry, stop_after_attempt


@retry(stop=stop_after_attempt(5))
def get(self):
osint_alex
  • 952
  • 3
  • 16
  • 1
    wow, thanks! the tenacity approach helped me a lot. i was trying to solve this using similar logic to that, but i couldn't get it to function, and i didnt know this plugin existed! – bigman1234 Aug 31 '21 at 10:37