Creating a sitemap with Django

The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A sitemap is an XML file that lists a site’s URLs. It allows webmasters to include additional information about each URL: when it was last updated, how often it is likely to be changed, and how important it is in relation to other URLs on the site. This allows search engines to crawl the site more intelligently.

This text shows how to generate a sitemap.xml with “django.contrib.sitemaps“ in Django, based on a simple blog application.
You can download the complete code here [3].

Overview:

create the base system

First create a new project called “my_project“.

django-admin.py startproject my_project

Change to the newly created folder “my_project“ and create an application called “blog“.

cd my_project/; django-admin.py startapp blog

settings

In the settings file (settings.py) you have to choose your database, in this example sqlite3 and a database file “database.db“ are used.

DATABASE_ENGINE = 'sqlite3'
DATABASE_NAME = 'database.db'

Also we need to install the “django.contrib.sitemaps“, “django.contrib.sites“ and of course our application “blog“ to the “INSTALLED_APPS“ tuple.

INSTALLED_APPS = (
    'django.contrib.sites',
    'django.contrib.sitemaps',
    'my_project.blog',
)

model

The model for the blog entries is called Post.

in my_project/blog/models.py

from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=255)
    text = models.TextField()

    def get_absolute_url(self):
        return "/post/%s/" % (self.id,)

The class/model “Post“ consists of a title field and a larger text field for the content. Also it uses the “get_absolute_url“ method to tell Django how to calculate the URL for an object. The pattern looks like this: /post/1/

Now we sync the model into the database.

./manage.py syncdb

create data

To create data you can use the Django/Python shell.

./manage.py shell

First load the Post model, than create two new objects/datasets.

>>> from my_project.blog.models import Post
>>> Post(title="first post", text="first text").save()
>>> Post(title="second post", text="second text").save()

sitemap

The sitemap.py consists of all links generated by the Post model by calling the method “get_absolute_url“ for all objects in Post.
Now you can define how often the search engine should crawl the site, which priority the site has and some other values. Take a look at the documentation for more [1].

in my_project/sitemap.py

from django.contrib.sitemaps import Sitemap
from my_project.blog.models import Post

class PostSitemap(Sitemap):
    changefreq = 'monthly'
    priority = 0.5

    def items(self):
        return Post.objects.all()

controller

Now we define the URL pattern that is returned by the generated sitemap.xml file.

in my_project/url.py

from django.conf.urls.defaults import *
from django.contrib.sitemaps import views as sitemap_views
from my_project.sitemaps import PostSitemap

sitemaps = {
    'post': PostSitemap,
}

urlpatterns = patterns('',
    (r'^sitemap.xml$', sitemap_views.sitemap, {'sitemaps': sitemaps}),
)

result

Now start the developer server and point the browser to http://127.0.0.1:8000/sitemap.xml.

./manage.py runserver

The generated xml file should be displayed in your browser.

Links:

[1] http://www.djangoproject.com/documentation/sitemaps/
[2] http://www.sitemaps.org
[3] http://media.b23.at/download/django_sitemap.tar.gz

1 comment so far

  1. Colin on

    Why, following both your excellently written tutorial and the example on the Django docs page, do I keep getting NoReverseMatch errors?

    I can’t stand the NoReverseMatch error. It so damned hard to understand.


Leave a reply