General approach to making a Wagtail site based on an existing database

Question

Please offer guidance in setting up a Wagtail site from an existing, fairly large database.

At its core, the site will be very conventional, consisting of two basic page types: an 'index' page type that lists multiple items in the database, and a 'details' page type that displays information about an individual item.

Let's say that I'm working from a database of living things. In this scenario, each record in the main database table corresponds to a particular species.

In the planned Wagtail site, a given index page would present a list of species (likely filtered by categories such as 'plant,' 'extinct,' or 'imaginary'). The user would be able to click on any listed species to display the detail page for that species, which would give a full description and a photo of the insect, fern, or dinosaur. The detail page would also show other information such as the species' full taxonomic classification, its geographic range, and so on.

My existing database is incomplete but already contains hundreds of thousands of 'species,' so I need to import that data into Wagtail (creating a Wagtail page from scratch for each existing database record would not be at all practical, of course).

What would be the best way to import the database and access the data in Wagtail? I could:

instantiate programmatically (per the technique given here) a Wagtail page for each species in the current database, so that each species has its own 'permanent' page;

— or —
import the data into the site's Wagtail database, independent of the page structure. I would then have to create snippet models to access the data, and add code to render a page dynamically for any chosen species, based on data provided by an appropriate query run on those snippets.

I'm a complete novice at all of this, but method 1 may have two advantages: the predefined pages created during the import will already have slugs that I can easily convert to clickable links on the index pages; and the predefined pages will be discoverable by search engines.

Method 2 seems cleaner to me—keeping the core, "data data" separate from the "page data" (such as the title and slug) is logical and would simplify import/export procedures. But am I correct that it would reduce visibility of the individual pages to search engines? Aside from the programming required to render the pages dynamically, would it have other advantages or disadvantages compared to method 1?

(Regarding method 2: In Wagtail, how can pages be rendered dynamically from snippet-based query results? I know about RoutablePageMixin, but the examples I've found use it to access already-defined Wagtail pages. I would instead want to put data from snippet queries into a template, with a slug created during the page rendering process. A recipe or example would be appreciated.)

How will the data be used after port to wagtail? Is there an existing process that adds new data or, once set up, it will be managed using the wagtail CMS? — Michael Lindsay, Jan 05 '21 at 21:20
@MichaelLindsay: sorry for the delay, there had been no responses so I moved on and begged help from someone I cold called and graciously gave input -- I chose the Page approach for now. But in answer to your question, my expectation is that for efficiency reasons the project will continue to use table-oriented database tools to enter and update the individual records (ie for the 'species') and then upload those changes to the Wagtail database. — Joan Eliot, Jan 13 '21 at 19:17
So the plan is to make a Page per species, and update/create those pages using the admin command, essentially copying the data. Considering how much wagtail likes tree structures, will you make classes for Kingdom, Phylum, Class, etc? If you have search, I recommend starting with the elasticsearch backend. — Michael Lindsay, Jan 13 '21 at 19:48
Thanks for the encouragement to use elasticsearch. So far I'm using Wagtail's builtin search but I'll need something more robust. — Joan Eliot, Jan 15 '21 at 14:47

score 1 · Answer 1 · answered Jan 15 '21 at 15:54

No one has responded to this question so far, but I did receive very helpful guidance from an experienced Wagtail developer I contacted on my own. Based on that input I decided to use method #1 (a Wagtail page for each individual item/'species'), for reasons I'll state at the end of this post.

First, here's an edited version of the input I got over a couple of emails:

I cannot give you a definite answer as I do not know enough about the problem you are trying to solve, and remember that anything can be changed in the future. But I think that it might be easier initially to go with option 1. The Page-as-entity approach makes the whole thing easier to reason about. Here are some pros and cons to consider:

Core Page Models

Upside: You get slugs, unique URLs for each entity, SEO features, page rendering controls (e.g. publish checkbox), searching, translation etc — all of these features work out of the box without too much work. You get tree structures for free.

Downside: You are forced into a URL and page hierarchy early on in development, which may not suitable in the future. With thousands of pages you may start to run into some performance issues, and editing page data using the Wagtail admin interface isn't very efficient. It will not be easy to share pages across multiple sites (e.g., if you have a simplified site for public use another site with more data where log in is required.

Mitigation to the downsides:
URL Structure - Set up a page hierarchy that is independent of your categories/grouping (see below)
Performance - custom code may help if you start to run into trouble, but best to wait until you actually have issues. Page listing is cached well and does not access the deeper models until needed.
Editing interface - the editing set up could be enhanced with the use of ModelAdmin to give you multiple ways to access/navigate
through the page lists — https://docs.wagtail.io/en/stable/reference/contrib/modeladmin/index.html
Cross-site sharing - Not really an easy way around this

Core Entity Model as Plain Django Models (used with the Snippets mixin)

Choose this approach if you want to keep your entity data isolated from how they are used in pages, across sites, giving you more control.

Upsides: More control about how the entity models are used, no need to stick to some of the rigid fields of Title/Slug/id etc, more control as to how the editing interface presents these entities (e.g. via Snippet editing or ModelAdmin without having to think about how these items are used in the Page editing UI area).

Downsides: Potential performance issues for showing listings of snippets in selection, internationalisation/translation will be harder once you get to that point, showing each entity as a standalone page is harder and will require a bit of rewriting some of the things that Page models do already.

Mitigation to the downsides:
Performance - similar to above, solve it when it becomes an issue, you may need to get your hands dirty with a bit of smarter indexing and Django query caching.
Internationalisation - possible but you may find you will be working against the framework a bit.
Standalone page rendering - routablePageMixin is your friend, you will need to essentially implement a Page View that works for each unique entity, generating slugs on the fly (which remember, could change, which means SEO implications) and other things that normal Page Views do by default, all is doable though.

Either way, it's important that you plan your page structure carefully. For option one it would be much better to use a flatter approach, even though the tree structure of pages makes it seem easy to make things really nested—sometimes that can backfire. If your page structure has a flat approach for entities (ie. one page for 'entities' and each entity is a single page directly underneath), then you are free to make other pages or other snippets, tags or a mix of anything that relate to these entity pages. You can then make 100 ways to 'list' these pages that are completely separate from the actual individual entity page (in terms of the tree structure).

With the above in mind, I decided that the advantages of tying our core data into the Wagtail page structure outweigh those of keeping the data in independent models/tables. One big factor is simply that things will go faster at a time when I'm just trying to get a first version of the site running. Another is that translated/internationalized versions of the site are indeed planned, so I want the benefits of Wagtail's support in that area.

As far as the inefficiency of editing large amounts of data in Wagtail goes, the solution there is to use database tools and then upload the changes to the Wagtail db. Access to those core content fields will probably have to be restricted in the Wagtail admin so that content editors don't make changes that get overwritten with the next upload.

Going with the Page model method will mean some loss of flexibility and I'll just have to endure the idea of the core data being "contaminated" with (display-related) Wagtail page items in the same tables.

But I have since come across an impressive open-source Wagtail project that uses snippet-style configuration for its core data: https://github.com/okfn/rtei / https://www.rtei.org/en/

General approach to making a Wagtail site based on an existing database

1 Answers1