What is the fastest template system for Python?

Question

Jinja2 and Mako are both apparently pretty fast.

How do these compare to (the less featured but probably good enough for what I'm doing) string.Template ?

"compare"? Do you want to compare speed? The jinja folks say string.Template is faster. What more do you need to know? Or do you want to compare some other aspect? — S.Lott, Aug 24 '09 at 19:42
You probably don't care how fast the templating system is. Among the popular ones, they all have perfectly acceptable performance characteristics. Please make decisions like this based on more important things, like ease of programming. — Christian Oudard, Aug 24 '09 at 21:01
It depends, really. Where I work we serve a lot of templates per seconds and we have an army of highly skilled coders and designers, so in this context speed is more important than "ease of programming". Moreover, I would say that ease of reading is more important than ease of programming. — gb., Mar 28 '12 at 14:21
@techtonik If you have more info to add, you can do so in a new answer - it is considered bad practice changing the actual content of an answer :) — Emil, Jan 19 '13 at 17:08
@Emil the picture doesn't add more info, but makes the answer more argumented. — anatoly techtonik, Jan 31 '13 at 07:08
@techtonik In my opinion, the argumentation in an answer is up to the original poster. Please read [this article](http://blog.stackoverflow.com/2009/03/the-great-edit-wars) about editing. — Emil, Jan 31 '13 at 08:47
@Emil - I've read the article just to confirm that my edit follows good editing practice - `Clarify meaning without changing it.` and `Add related resources or links.`. As you may see from the top voted comment, the link is useful, so I've added the picture in case the link will gone. Let me also say why I disapproved your edit - there is no explanation why do you revert and even after reading the article you linked I can't see any grounds for reverting. I am letting this to be in your way even if you're not the author, just because it doesn't worth the hassle. =) — anatoly techtonik, Jan 31 '13 at 09:42
@techtonik I'm not going to discuss this any further, but you pointed to my point: the top comment. Comments are for adding relevant information or debating it's content. Feel free to edit as you like, my rollback might have been excessive. :) — Emil, Jan 31 '13 at 11:33
@ChristianOudard "they all have perfectly acceptable performance characteristics…" except not one has controllable flush points, meaning not one is suitable for my use case of generating _and **streaming**_ large RSS feeds. Mako was taking up to a minute, the timeout was 30 seconds. Nobody would get anything, and the server would absolutely thrash RAM in the process. Wrote a new template engine ([cinje](https://github.com/marrow/cinje#readme)) with explicit `: flush`, and now our feeds start streaming instantly, and take no more than 20 seconds total. — amcgregor, Jul 03 '19 at 15:32

Ants Aasma · Accepted Answer · 2009-08-25T06:04:56.813

105

Here are the results of the popular template engines for rendering a 10x1000 HTML table.

Python 2.6.2 on a 3GHz Intel Core 2

Kid template                         696.89 ms
Kid template + cElementTree          649.88 ms
Genshi template + tag builder        431.01 ms
Genshi tag builder                   389.39 ms
Django template                      352.68 ms
Genshi template                      266.35 ms
ElementTree                          180.06 ms
cElementTree                         107.85 ms
StringIO                              41.48 ms
Jinja 2                               36.38 ms
Cheetah template                      34.66 ms
Mako Template                         29.06 ms
Spitfire template                     21.80 ms
Tenjin                                18.39 ms
Spitfire template -O1                 11.86 ms
cStringIO                              5.80 ms
Spitfire template -O3                  4.91 ms
Spitfire template -O2                  4.82 ms
generator concat                       4.06 ms
list concat                            3.99 ms
generator concat optimized             2.84 ms
list concat optimized                  2.62 ms

The benchmark is based on code from Spitfire performance tests with some added template engines and added iterations to increase accuracy. The list and generator concat at the end are hand coded Python to get a feel for the upper limit of performance achievable by compiling to Python bytecode. The optimized versions use string interpolation in the inner loop.

But before you run out to switch your template engine, make sure it matters. You'll need to be doing some pretty heavy caching and really optimized code before the differences between the compiling template engines starts to matter. For most applications good abstraction facilities, compatibility with design tools, familiarity and other things matter much much more.

edited Aug 25 '09 at 06:04

answered Aug 25 '09 at 05:28

Ants Aasma

53,288
15
90
97

6

I didn't know that Django template is that sloow. – Joshua Partogi Aug 25 '09 at 05:57
2

I didn't either. It's a small part of the equation for most, but if you're rendering a 10x1000 table of data, you're in trouble. – orokusaki Sep 18 '10 at 21:06
23

This comparison is, of course, highly dependent on what you're doing. What if you're rendering lots of small templates rather than one massive table? Then entirely different performance characteristics of the template engine would become relevant, like template parsing and loading time. Moral? Make optimization decisions based on your own benchmarks of your own code. – Carl Meyer Feb 28 '11 at 19:58
3

Yep, Tenjin has a 3ms load time for every render, in my case of a forum with threaded comments Cheetah takes 0.4 ms for 1 comment while tenjin takes 3ms, at 50 comments tenjin and cheetah meet at 5ms. At 5000 tenjin is at 40ms Cheetah is at 250ms. – Jul 31 '13 at 18:54
I have [my own copy of the "bigtable" (10x1000) test](https://github.com/marrow/cinje/wiki/Benchmarks) used for benchmarking, having written my own engine after discovering that the other wheels were all square (no mid-stream flushing, sub-optimal performance, hilarious complexity). If TTFB is important to you, 0.02ms (47,938 gen/sec) seems to solidly win the "fastest" badge. (Vs. Tenjin's 50 gen/sec.) – amcgregor Dec 12 '18 at 20:24
Can the answer be updated to have an updated benchmark results from [here](https://github.com/youtube/spitfire#performance)? – AmeyaVS Mar 27 '19 at 15:12
@AmeyaVS A 10-year-old benchmark result set should be updated with 4-year-old benchmark result set generated under a version of Python that _doesn't exist any more_? Better would be to run the benchmark suite locally for a "real" comparison under your actual runtime of choice. Additionally, there may be faster variant approaches that are faster due to a reduction in safety, which can be acceptable, ref: the `*_unsafe` benchmark results from my previously supplied result page link. Lastly, more important, IMO, is "time to first byte" (TTFB), thus the `*_flush_first` results. – amcgregor Jan 13 '20 at 15:57

score 10 · Answer 2 · answered Aug 24 '09 at 19:30

From the jinja2 docs, it seems that string.Template is the fastest if that's all you need.

Without a doubt you should try to remove as much logic from templates as possible. But templates without any logic mean that you have to do all the processing in the code which is boring and stupid. A template engine that does that is shipped with Python and called string.Template. Comes without loops and if conditions and is by far the fastest template engine you can get for Python.

score 3 · Answer 3 · answered Nov 09 '09 at 00:16

3

If you can throw caching in the mix (like memcached) then choose based on features and ease of use rather than optimization.

I use Mako because I like the syntax and features. Fortunately it is one of the fastest as well.

answered Nov 09 '09 at 00:16

Tony

2,037
3
22
22

We used to use Mako templates, until our multi-megabyte RSS feeds began failing to generate in time (<30s), as Mako buffers the whole response, our users were getting absolutely nothing back and our memory pressure was extraordinarily high. [It's also not particularly fast.](https://github.com/marrow/cinje/wiki/Benchmarks#python-37) It's in the same class as Bottle's engine, or Tenjin, but pales in comparison to my own, which does support mid-stream flushing. (48 gen/sec vs. 47,937 gen/sec.) (Now, even if the feed takes longer than 30s to generate, it doesn't time out and is _streamed_.) – amcgregor Apr 19 '19 at 16:59

score 1 · Answer 4 · answered Aug 24 '09 at 20:19

In general you will have to do profiling to answer that question, as it depends on how you use the templates and what for.

string.Template is the fastest, but so primitive it can hardly be called a template in the same breath as the other templating systems, as it only does string replacements, and has no conditions or loops, making it pretty useless in practice.

score -4 · Answer 5 · answered Aug 25 '09 at 00:29

-4

I think Cheetah might be the fastest, as it's implemented in C.

answered Aug 25 '09 at 00:29

Koen Bok

3,234
3
29
42

21

Just because something is written in C, it does not mean that it will be faster; These things highly depend on the developer. – kzh Jun 03 '10 at 15:58
6

Yes, what kzh is true. Also, Cheetah isn't written in C -- it's written in Python. However, a small part of it, the "name mapper", can optionally use a much faster, compiled C version. – Ben Hoyt Feb 15 '11 at 21:27
I once wrote an HTTP/1.1 server in Python. For normalizing the header names, which under WSGI must be `ALL_UPPER_UND_SEP` (a la CGI) I thought, gee, C code must be efficient for this, so I "borrowed" the relevant C module from Rack, or another Ruby web server. It was 1000 lines of C machine-generated from Ruby. The Python ran _**substantially** faster_ than the compiled C, despite repeated `.replace()` invocations and chains of other filters (`upper`, &c.), and was [infinitely easier to understand](https://github.com/marrow-legacy/server.http/commit/0c769e1e9226cdb972b9f90a660a459c4ccc904a). – amcgregor Mar 26 '19 at 17:27
Ah, right, I'll also point out the template-specific benchmarks. Cheetah is one of the absolute worst performers in the "big table" test, that is, generation of a single `` with 10 columns and 1000 rows from a static list of dicts. [It gets just under 10 generations per second.](https://github.com/marrow/cinje/wiki/Benchmarks#python-37) Any of the modern parser/generator engines trounce it (even Mako is 2x faster), and my own method of streaming generation leaves it in the dust. (Just under 48,000 generations per second: Cheetah is thus 0.02% as performant. Not 2%. 0.02%.)
– amcgregor Apr 19 '19 at 17:06

What is the fastest template system for Python?

5 Answers5

Linked