6

I have the following url configuration

url(r'^sitemap\.xml$', index, {'sitemaps': sitemaps}),
url(r'^sitemap-(?P<section>.+)\.xml', cache_page(86400)(sitemap), {'sitemaps': sitemaps}),

and sitemaps include following sitemap

 class ArticlesDetailSiteMap(Sitemap):
    changefreq = "daily"
    priority = 0.9

    def items(self):
        return Article.objects.filter(is_visible=True, date_published__lte=timezone.now())

but there are more than 50.000 articles. So i get timeout error when i try /sitemap-articles.xml because it tries to get all the articles.

Any ideas how should i create an index and make the pagination work here as it says in the documentation below,

https://docs.djangoproject.com/en/dev/ref/contrib/sitemaps/#creating-a-sitemap-index

tuna
  • 6,211
  • 11
  • 43
  • 63

2 Answers2

5

I have put limit=5000 and issue resolved.

class ArticlesDetailSiteMap(Sitemap):
    changefreq = "daily"
    priority = 0.9
    limit = 5000

    def items(self):
        return Article.objects.filter(is_visible=True, date_published__lte=timezone.now())

and it created paginated urls for all Articles paginated by 5000

tuna
  • 6,211
  • 11
  • 43
  • 63
  • 1
    This is correct answer (worked for me). See documentation: https://docs.djangoproject.com/en/1.9/ref/contrib/sitemaps/#django.contrib.sitemaps.Sitemap.limit. – illagrenan May 17 '16 at 15:02
  • An updated link for the previous comment: https://docs.djangoproject.com/en/dev/ref/contrib/sitemaps/#django.contrib.sitemaps.Sitemap.limit – Mark Chackerian Jul 03 '20 at 13:00
3

Try this

from django.core.paginator import Paginator, PageNotAnInteger, EmptyPage

And then

article_list = Article.objects.filter(is_visible=True, date_published__lte=timezone.now())
paginator = Paginator(article_list, 10)
page = request.GET.get('page')


try:
    articles = paginator.page(page)
except PageNotAnInteger:
    articles = paginator.page(1)
except EmptyPage:
    articles = paginator.page(paginator.num_pages)

And you can access the site map using the URLs like sitemap\.xml?page=5

  • Yea, i know this, but it says in documentation that it handles the pagination itself, after creating an index. I am not sure where did i go wrong, and how to create an index. – tuna Jul 14 '14 at 17:14
  • 1
    Doc says `You should create an index file if one of your sitemaps has more than 50,000 URLs. In this case, Django will automatically paginate the sitemap, and the index will reflect that.` – tuna Jul 14 '14 at 17:18
  • In that case, try adding a database index on the table on the `is_visible` field. You can do that using `db_index=True`. IMO, this really boils down to database optimization. You might end up looking at your queries and trying a bunch of stuff to tune them on the DB side. –  Jul 14 '14 at 17:22
  • `is_visible` feild is already `db_index=True`, still having this problem – tuna Jul 14 '14 at 17:34
  • Try running some DB diagnostics, in that case. Get the query that django is trying to invoke on the DB server, and run an explain plan on it. –  Jul 14 '14 at 18:05