4

What I'm making:

An RSS reader with a couple (10 to 15) fixed feeds.

The problem:

When I hit refresh on the browser, it takes around 15 seconds to load.

I know that most of the loading time is waiting for the server to iterate over every feed and load all the entries from each one.

Maybe AJAX could be the solution?


Code:

This is the view:

@app.route('/')
def index():
    RSS_URLS = [
        'http://feeds.feedburner.com/RockPaperShotgun',
        'http://www.gameinformer.com/b/MainFeed.aspx?Tags=preview',
        'http://www.polygon.com/rss/group/news/index.xml',
        ]

    entries = []
    for url in RSS_URLS:
        entries.extend(feedparser.parse(url).entries)

    entries_sorted = sorted(
        entries,
        key=lambda e: e.published_parsed,
        reverse=True)

    return render_template(
        'index.html',
        entries=entries_sorted
        )

And this is the template:

{% block content %}
    <div class="row">
    {% for e in entries %}
        <div class="col-md-4 col-lg-3">
            <h1><a href="{{ e.link }}">{{ e.title }}</a></h1>
            <h5>Published on: {{ e.published }}</h5>
            {% for content in e.content %}
                <p>{{ content.value|safe }}</p>
            {% else %}
                <p>{{ e.summary_detail.value|safe }}</p>
            {% endfor %}
        </div>
    {% endfor %}
    </div>
{% endblock %}
Santiago Quiroga
  • 453
  • 1
  • 6
  • 19

1 Answers1

3

You can fetch feeds in parallel: Practical threaded programming with Python Eventlet Here there are some code examples

import feedparser
import multiprocessing
def parallel_with_gevent():
    import gevent.monkey
    gevent.monkey.patch_all()
    from gevent.pool import Pool

    # limit ourselves to max 10 simultaneous outstanding requests
    pool = Pool(10)

    def handle_one_url(url):
        parsed = feedparser.parse(url)
        if parsed.entries:
            print 'Found entry:', parsed.entries[0]

    for url in LIST_OF_URLS:
        pool.spawn(handle_one_url, url)
    pool.join()

I am using cache file on same scenarios.

def update_cache(tmp_file,cache):
    """ logic to update cache """
    pass
def return_cache(tmp_file,update_time_sec):
    if os.path.getctime(tmp_file) < (time.time() - update_time_sec)
        with open(tmp_file,"r") as data:
                return data
    else:
        return None

@app.route('/')
def index():
entries_sorted=return_cache(tmp_file,update_time_sec)
if entries_sorted!=None:
    return render_template(
    'index.html',
    entries=entries_sorted
    )
RSS_URLS = [
    'http://feeds.feedburner.com/RockPaperShotgun',
    'http://www.gameinformer.com/b/MainFeed.aspx?Tags=preview',
    'http://www.polygon.com/rss/group/news/index.xml',
    ]

entries = []
for url in RSS_URLS:
    entries.extend(feedparser.parse(url).entries)

entries_sorted = sorted(
    entries,
    key=lambda e: e.published_parsed,
    reverse=True)
update_cache(tmp_file,cache)
return render_template(
    'index.html',
    entries=entries_sorted
    )
Valeriy Solovyov
  • 5,384
  • 3
  • 27
  • 45
  • 1
    return_cache() checks if file was changed less then update_time_sec ago. If not it return content of file. If yes - return none. In main code we are receiving entries_sorted=return_cache() if it is None - work your code. At the end I added update_cache. It must rewrite cache file with new data. – Valeriy Solovyov Feb 25 '15 at 13:52
  • I added links with simultaneous example. You can simultaneously fetch feeds and parse them – Valeriy Solovyov Feb 25 '15 at 14:06