(Comparing uncomparable is always a dual-sided sword.
The below is presented in a fair belief that LLVM / JIT-powered code benchmarks ought be compared to some other LLVM / JIT-powered alternatives should any derived conclusion shall serve as a basis for reasonably supported decisions.)
Intro : ( numba
stuff and [us] results come a bit lower down the page )
With all due respect, julia-lang official site presents a tabulated set of performance testing, where two categories of facts are stated. The first, related to how the performance test was performed ( julia, using LLVM compiled code-execution v/s python, remaining a GIL-stepped, interpreted code-execution ). The second, how much longer do other languages take to complete the same "benchmark-task", using C-compiled code execution as a relative unit of time = 1.0
The chapter header, above a Table with results, says (cit.:)
High-Performance JIT Compiler
Julia’s LLVM-based just-in-time (JIT) compiler combined with the language’s design allow it to approach and often match the performance of C.

I thought a bit more rigorous to compare apples to apples and took just one of the "benchmark-task"-s, called the pi-sum
.
This was the second worst time for interpreted python, presented to have run 21.99 times slower than a LLVM/JIT-compiled julia-code or C-compiled alternative.
So the small experimentation story started.
@numba.jit( JulSUM, nogil = True )
:
Let's start to compare apples to apples. If julia code is reported to run 22x faster, let's first measure a plain interpreted python code run.
>>> def JulSUM():
... sum = 0.
... j = 0
... while j < 500:
... j += 1
... sum = 0.
... k = 0
... while k < 10000:
... k += 1
... sum += 1. / ( k * k )
... return sum
...
>>> from zmq import Stopwatch
>>> aClk = Stopwatch()
>>> aClk.start();_=JulSUM();aClk.stop()
1271963L
1270088L
1279277L
1277371L
1279390L
1274231L
So, the core of the pi-sum
runs about 1.27x.xxx [us] ~ about 1.27~1.28 [s]
Given the table row for pi-sum
in language presentation on julia-lang website, the LLVM/JIT-powered julia code execution ought run about 22x faster, i.e. under ~ 57.92 [ms]
>>> 1274231 / 22
57919
So, let's convert oranges to apples, using numba.jit
( v24.0 )
>>> import numba
>>> JIT_JulSUM = numba.jit( JulSUM )
>>> aClk.start();_=JIT_JulSUM();aClk.stop()
1175206L
>>> aClk.start();_=JIT_JulSUM();aClk.stop()
35512L
37193L
37312L
35756L
34710L
So, after JIT-compiler has made it's job, numba-LLVM'ed python exhibits benchmark times somewhere about 34.7 ~ 37.3 [ms]
Can we go farther?
Oh sure, we have not done much of the numba
tweaking yet, while the code example is so trivial, not much surprising advances are expected to appear down the road.
First, let's remove the here unnecessary GIL-stepping:
>>> JIT_NOGIL_JulSUM = numba.jit( JulSUM, nogil = True )
>>> aClk.start();_=JIT_NOGIL_JulSUM();aClk.stop()
85795L
>>> aClk.start();_=JIT_NOGIL_JulSUM();aClk.stop()
35526L
35509L
34720L
35906L
35506L
nogil=True
does not bring the execution much farther,
but still shaves a few [ms] more, driving all results under ~ 35.9 [ms]
>>> JIT_NOGIL_NOPYTHON_JulSUM = numba.jit( JulSUM, nogil = True, nopython = True )
>>> aClk.start();_=JIT_NOGIL_NOPYTHON_JulSUM();aClk.stop()
84429L
>>> aClk.start();_=JIT_NOGIL_NOPYTHON_JulSUM();aClk.stop()
35779L
35753L
35515L
35758L
35585L
35859L
nopython=True
does just a final polishing touch
to get all results consistently under ~ 35.86 [ms] ( vs. ~57.92 [ms] for LLVM/JIT-julia )
Epilogue on DSP processing:
For the sake of the OP question about additional benefits for accelerated DSP-processing,
one may try and test numba
+ Intel Python ( via Anaconda ), where Intel has opened a new horizon in binaries, optimised for IA64-processor internalities, thus the code-execution may enjoy additional CPU-bound tricks, based on Intel knowledge of ILP4, vectorisation and branch-prediction details their own CPU-s exhibit in runtime. Worth a test to compare this ( plus one may enjoy their non-destructive code-analysis tool integrated into VisualStudio, where in-vitro code-execution hot-spots could be analysed in real-time -- a thing a DSP engineer would just love, wouldn't he/she?