What are the fastest algorithms for rendering the mandelbrot set?

Question

I've tried many algorithms for the rendering of the Mandelbrot set, inclusive of the naive escape time algorithm, as well as the optimized escape time algorithm. But, are there faster algorithms that are used to produce really deep zooms efficiently like the ones we see on YouTube. Also, I would love to get some ideas on how to increase my precision beyond the C/C++ double

Spektre · Accepted Answer · 2021-03-31T07:29:04.497

Even High end CPU will be much slower in comparison to average GPU. You can get to real time rendering even with naive iteration algo on GPU. So using better algorithms on GPU could get to high zooms however for any decent algo you need:

multi pass rendering as we can not self modify a texture on GPU
high precision floating point as floats/doubles are not enough.

Here few related QAs:

which might get you kick started...

One way to speed up is you can use fractional escape like I did in the first link. It improves image quality while keeping max iteration low.

The second link will get you approximation of which parts of fractal are in and out and how far. Its not very accurate but can be used to avoid computing iterations for parts that are "outside for sure".

Next link will show you how to achieve better precision.

Last link is about Perturbation The idea is that you use high precision math only for some reference points and use that to compute its neighbor points with low precision math without loosing precision. Never used that however but looks promising.

And finally once you achieved fast rendering you might want to aim for this:

How to adjust panning while zooming Mandelbrot set

Here a small example of 3* 64bit double used for single value in GLSL:

// high precision float (very slow)
dvec3 fnor(dvec3 a)
    {
    dvec3 c=a;
    if (abs(c.x)>1e-5){ c.y+=c.x; c.x=0.0; }
    if (abs(c.y)>1e+5){ c.z+=c.y; c.y=0.0; }
    return c;
    }
double fget(dvec3 a){ return a.x+a.y+a.z; }
dvec3 fset(double a){ return fnor(dvec3(a,0.0,0.0)); }
dvec3 fadd(dvec3 a,double b){ return fnor(a+fset(b)); }
dvec3 fsub(dvec3 a,double b){ return fnor(a-fset(b)); }
dvec3 fmul(dvec3 a,double b){ return fnor(a*b); }
dvec3 fadd(dvec3 a,dvec3 b){ return fnor(a+b); }
dvec3 fsub(dvec3 a,dvec3 b){ return fnor(a-b); }
dvec3 fmul(dvec3 a,dvec3 b)
    {
    dvec3 c;
    c =fnor(a*b.x);
    c+=fnor(a*b.y);
    c+=fnor(a*b.z);
    return fnor(c);
    }

so each hi precision value is dvec3 ... the thresholds in fnor can be changed to any ranges. You can convert this to vec3 and float ...

[Edit1] "fast" C++ example

Ok I wanted to try my new SSD1306 driver along with my AVR32 MCU to compute Mandelbrot So I can compare speed with this Arduino + 3D + Pong + Mandelbrot. I used AT32UC3A3256 with ~66MHz no FPU no GPU and 128x64x1bpp display. No external memory only internal 16+32+32 KByte. Naive Mandlebrot was way to slow (~2.5sec per frame) so I busted up something like this (taking advantage of that position and zoom of the view is sort of continuous):

reduce resolution by 2

to make room for dithering as my output is just B&W
use variable max iteration n based on zoom

On change of n invalidate last frame to enforce full recompute. I know this is slow but it happens only 3 times on transitions between zoom ranges.

Scaling count from last frame is not looking good as its not linear.

Its possible to use the last counts but for that it would be needed also to remember the complex variables used for iteration and that would take too much memory.
remember last frame and also which x,y screen coordinate mapped to which Mandelbrot coordinate.
On each frame compute the mapping between screen coordinates and Mandelbrot coordinates.
remap last frame to adjust to new position and zoom

so simply look at the data from #3,#4 and if we have there the same positions in both last and actual frame (closer then half of pixel size), copy the pixels. And recompute the rest.

This will hugely improve performance if your view is smooth (so position and zoom does not change a lot on per frame basis).

I know its a bit vague description so here a C++ code where you can infer all doubts:

//---------------------------------------------------------------------------
//--- Fast Mandelbrot set ver: 1.000 ----------------------------------------
//---------------------------------------------------------------------------
template<int xs,int ys,int sh> void mandelbrot_draw(float mx,float my,float zoom)
    {
    // xs,ys - screen resolution
    // sh    - log2(pixel_size) ... dithering pixel size
    // mx,my - Mandelbrot position (center of view) <-1.5,+0.5>,<-1.0,+1.0>
    // zoom  - zoom
    // ----------------
    // (previous/actual) frame
    static U8 p[xs>>sh][ys>>sh];            // intensities (raw Mandelbrot image)
    static int n0=0;                        // max iteraions
    static float px[(xs>>sh)+1]={-1000.0};  // pixel x position in Mandlebrot
    static float py[(ys>>sh)+1];            // pixel y position in Mandlebrot
    // temp variables
    U8 shd;                                 // just pattern for dithering
    int ix,iy,i,n,jx,jy,kx,ky,sz;           // index variables
    int nx=xs>>sh,ny=ys>>sh;                // real Mandelbrot resolution
    float fx,fy,fd;                         // floating Mandlebrot position and pixel step
    float x,y,xx,yy,q;                      // Mandelbrot iteration stuff (this need to be high precision)
    int   qx[xs>>sh],qy[ys>>sh];            // maping of pixels between last and actual frame
    float px0[xs>>sh],py0[ys>>sh];          // pixel position in Mandlebrot from last frame
    // init vars
         if (zoom<   10.0) n= 31;
    else if (zoom<  100.0) n= 63;
    else if (zoom< 1000.0) n=127;
    else                   n=255;
    sz=1<<sh;
    ix=xs; if (ix>ys) ix=ys; ix/=sz;
    fd=2.0/(float(ix-1)*zoom);
    mx-=float(xs>>(1+sh))*fd;
    my-=float(ys>>(1+sh))*fd;
    // init buffers
    if ((px[0]<-999.0)||(n0!=n))
        {
        n0=n;
        for (ix=0;ix<nx;ix++) px[ix]=-999.0; 
        for (iy=0;iy<ny;iy++) py[iy]=-999.0;
        for (ix=0;ix<nx;ix++)
         for (iy=0;iy<ny;iy++)
          p[ix][iy]=0;
        }
    // store old and compute new float positions of pixels in Mandelbrot to px[],py[],px0[],py0[]
    for (fx=mx,ix=0;ix<nx;ix++,fx+=fd){ px0[ix]=px[ix]; px[ix]=fx; qx[ix]=-1; }
    for (fy=my,iy=0;iy<ny;iy++,fy+=fd){ py0[iy]=py[iy]; py[iy]=fy; qy[iy]=-1; }
    // match old and new x coordinates to qx[]
    for (ix=0,jx=0;(ix<nx)&&(jx<nx);)
        {
        x=px[ix]; y=px0[jx];
        xx=(x-y)/fd; if (xx<0.0) xx=-xx;
        if (xx<=0.5){ qx[ix]=jx; px[ix]=y; }
        if (x<y) ix++; else jx++;
        }
    // match old and new y coordinates to qy[]
    for (ix=0,jx=0;(ix<ny)&&(jx<ny);)
        {
        x=py[ix]; y=py0[jx];
        xx=(x-y)/fd; if (xx<0.0) xx=-xx;
        if (xx<=0.5){ qy[ix]=jx; py[ix]=y; }
        if (x<y) ix++; else jx++;
        }
    // remap p[][] by qx[]
    for (ix=0,jx=nx-1;ix<nx;ix++,jx--)
        {
        i=qx[ix]; if ((i>=0)&&(i>=ix)) for (iy=0;iy<ny;iy++) p[ix][iy]=p[i][iy];
        i=qx[jx]; if ((i>=0)&&(i<=jx)) for (iy=0;iy<ny;iy++) p[jx][iy]=p[i][iy];
        }
    // remap p[][] by qy[]
    for (iy=0,jy=ny-1;iy<ny;iy++,jy--)
        {
        i=qy[iy]; if ((i>=0)&&(i>=iy)) for (ix=0;ix<nx;ix++) p[ix][iy]=p[ix][i];
        i=qy[jy]; if ((i>=0)&&(i<=jy)) for (ix=0;ix<nx;ix++) p[ix][jy]=p[ix][i];
        }
    // Mandelbrot
    for (iy=0,ky=0,fy=py[iy];iy<ny;iy++,ky+=sz,fy=py[iy]) if ((fy>=-1.0)&&(fy<=+1.0))
     for (ix=0,kx=0,fx=px[ix];ix<nx;ix++,kx+=sz,fx=px[ix]) if ((fx>=-1.5)&&(fx<=+0.5))
        {
        // invalid qx,qy ... recompute Mandelbrot
        if ((qx[ix]<0)||(qy[iy]<0))
            {
            for (x=0.0,y=0.0,xx=0.0,yy=0.0,i=0;(i<n)&&(xx+yy<4.0);i++)
                {
                q=xx-yy+fx;
                y=(2.0*x*y)+fy;
                x=q;
                xx=x*x;
                yy=y*y;
                }
            i=(16*i)/(n-1); if (i>16) i=16; if (i<0) i=0;
            i=16-i; p[ix][iy]=i;
            }
        // use stored intensity
        else i=p[ix][iy];
        // render point with intensity i coresponding to ix,iy position in map
        for (i<<=3                 ,jy=0;jy<sz;jy++)
         for (shd=shade8x8[i+(jy&7)],jx=0;jx<sz;jx++)
          lcd.pixel(kx+jx,ky+jy,shd&(1<<(jx&7)));
        }
    }
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------

the lcd and shade8x8 stuff can be found in the linked SSD1306 QA. However you can ignore it its just dithering and outputting a pixel so you can instead output the i directly (even without the scaling to <0..16>.

Here preview (on PC as I was lazy to connect camera ...):

so it 64x32 Mandelbrot pixels displayed as 128x64 dithered image. On my AVR32 is this may be 8x time faster than naive method (maybe 3-4fps)... The code might be more optimized however take in mind Mandelbrot is not the only stuff running as I have some ISR handlers on backround to handle the LCD and also my TTS engine based on this which I upgraded a lot since then and use for debugging of this (yes it can speek in parallel to rendering). Also I am low on memory as my 3D engine takes a lot ~11 KByte away (mostly depth buffer).

The preview was done with this code (inside timer):

static float zoom=1.0;
mandelbrot_draw<128,64,1>(+0.37,-0.1,zoom);
zoom*=1.02; if (zoom>100000) zoom=1.0;

Also for non AVR32 C++ environment use this:

//------------------------------------------------------------------------------------------
#ifndef _AVR32_compiler_h
#define _AVR32_compiler_h
//------------------------------------------------------------------------------------------
typedef int32_t  S32;
typedef int16_t  S16;
typedef int8_t   S8;
typedef uint32_t U32;
typedef uint16_t U16;
typedef uint8_t  U8;
//------------------------------------------------------------------------------------------
#endif
//------------------------------------------------------------------------------------------

[Edit2] higher float precision in GLSL

The main problem with Mandelbrot is that it need to add numbers with very big exponent difference. For +,- operations we need to align mantissa of both operands, add them as integer and normalize back to scientific notation. However if the exponent difference is big then the result mantissa need more bits than can fit into 32 bit float so only 24 most significant bits are preserved. This creates the rounding errors causing your pixelation. If you look at 32bit float in binary you will see this:

float a=1000.000,b=0.00001,c=a+b; 

//012345678901234567890123456789 ... just to easy count bits
a=1111101000b                                         // a=1000
b=         0.00000000000000001010011111000101101011b  // b=0.00000999999974737875
c=1111101000.00000000000000001010011111000101101011b  // not rounded result
c=1111101000.00000000000000b                          // c=1000 rounded to 24 bits of mantissa

Now the idea is to enlarge the number of mantissa bits. The easiest trick is to have 2 floats instead of one:

//012345678901234567890123 ... just to easy count bits
a=1111101000b                                         // a=1000
                       //012345678901234567890123 ... just to easy count     b=         0.0000000000000000101001111100010110101100b    // b=0.00000999999974737875
c=1111101000.00000000000000001010011111000101101011b  // not rounded result
c=1111101000.00000000000000b                          // c=1000 rounded to 24 
 +          .0000000000000000101001111100010110101100b
                           //012345678901234567890123 ... just to easy count bits

so some part of the result is in one float and the rest in the other... The more floats per single value we have the bigger the mantissa. However doing this on bit exact division of big mantissa into 24 bit chunks would be complicated and slow in GLSL (if even possible due to GLSL limitations). Instead we can select for each of the float some range of exponents (just like in example above).

So in the example we got 3 floats (vec3) per single (float) value. Each of the coordinates represent different range:

       abs(x) <= 1e-5
1e-5 < abs(y) <= 1e+5
1e+5 < abs(z)

and value = (x+y+z) so we kind of have 3*24 bit mantissa however the ranges are not exactly matching 24 bits. For that the exponent range should be divided by:

log10(2^24)=7.2247198959355486851297334733878

instead of 10... for example something like this:

       abs(x) <= 1e-7
1e-7 < abs(y) <= 1e+0
1e+0 < abs(z)

Also the ranges must be selected so they handle the ranges of values you use otherwise it would be for nothing. So if your numbers are <4 its pointless to have range >10^+5 So first you need to see what bounds of values you have, then dissect it to exponent ranges (as many as you have floats per value).

Beware some (but much less than native float) rounding still occurs !!!

Now doing operations on such numbers is slightly more complicated than on normal floats as you need to handle each value as bracketed sum of all components so:

(x0+y0+z0) + (x1+y1+z1) = (x0+x1 + y0+y1 + z0+z1)
(x0+y0+z0) - (x1+y1+z1) = (x0-x1 + y0-y1 + z0-z1)
(x0+y0+z0) * (x1+y1+z1) = x0*(x1+y1+z1) + y0*(x1+y1+z1) + z0*(x1+y1+z1)

And do not forget to normalize the values back to the defined ranges. Avoid adding small and big (abs) values so avoid x0+z0 etc ...

[Edit3] new win32 demo CPU vs. GPU

win32 Mandelbrot demo 64bit floats

Both executables are preset to the same location and zoom to show when the doubles starts to round off. I had to upgrade slightly the way how px,py coordinates are computed as around 10^9 the y axis started to deviate at this location (The threshold still might be too big for other locations)

Here preview CPU vs. GPU for high zoom (n=1600):

RT GIF capture of CPU (n=62++, GIF 4x scaled down):

I have already implemented it using GLSL on OpenGL , and it worked, but older versions of OpenGL (< 4.0) does not support ```double``` data type, and I'm new to implementation of arbitrary precision floating point! If it were integers then I would have been able to implement it myself, but I've no idea to implement high precision floats! Also, even when using the GPU approach, when I reach the points that are in the mandelbrot via zooming, my application tends to freeze ! — Vivekanand V, Mar 20 '21 at 08:53
@VivekanandV what resolution and zoom? I usually get to double precision related pixelation around zoom 10^14 while its still RT... And I have old computer ... try my demo from the first link (64bit) to compare with (but it does not use any fancy stuff just fractional escape and 2 pass recoloring based on histogram) — Spektre, Mar 20 '21 at 08:56
@VivekanandV btw from what I heard there are high precision libs for GLSL try to google if you need them. If you want to implement them on your own see the 3th link — Spektre, Mar 20 '21 at 08:58
Thankyou, very much for all that! I will definitely go through them all! My application gets stuck, due to delay in event handling, because of the rendering and I was using SDL 2.0 to get the OpenGL context, and SDL does not allow, the event polling to run as a separate thread. :( Using traditional C++ I was get to the same levels of zoom as you did! Also, I would like to know more about the theory behind using a sum of ```double``` to represent a quantity with sufficient precision! :) — Vivekanand V, Mar 20 '21 at 09:02
@VivekanandV its simple the most problematic operation on float is `+,-` because both operands must be shifted to the same common exponent. So when adding big and small numbers with exponent differenence bigger than mantissa bits you get rounding... when you store numbers for example as sum of values `x = x0 + x1 + x2 + ...` where `x0` holds number up to some value `N` the `x1` holds values `` ... and so on then you have single floating point number but with much bigger mantissa (multiple of the 24bit original from float) ... — Spektre, Mar 20 '21 at 09:08
@VivekanandV btw take a look at this [How to express tetration function, for complex numbers](https://stackoverflow.com/a/56735614/2521214) it might be interesting for you once you bored with Mandelbrot. — Spektre, Mar 20 '21 at 09:28
@VivekanandV I just implemented 3*double arithmetics but the result is the same so either I screwed something up or the pixelation is due to different reasons. — Spektre, Mar 20 '21 at 10:39
Thankyou for all that! I will definitely look into your code deeply and try to understand everything very deeply, before I actually implement it! BTW, what does ```fnor``` mean? I need to do a lot more research into floating point before I can fully comprehend your code! But anyways I liked your detailed explanation as well as the images of the Mandelbrot fractal, that you have attached with the other answers... Keep up the good work! :) — Vivekanand V, Mar 20 '21 at 14:04
@VivekanandV `f` means float and `nor` means normalize ... it just repair the `dvec3` so `x` value holds small values, `y` medium and `z` big values that sum up to the actual number... You can play with the threshold ranges `1e-5, 1e+5` ... however my nVidia OpenGL driver is time-outing for resolution above 400x400 as this is really slow ... it computes ~8.6 times slower than native double... the operations are `add,sub,mul` -> `+,-,*` — Spektre, Mar 20 '21 at 14:08
Mind blowing! Keep up the good work :) I just figured out, how to do multi threaded OpenGL rendering, using SDL . Now my application can do event polling in the main thread and rendering in the other! :) — Vivekanand V, Mar 21 '21 at 16:05
...also, your amazing implementation on Arduino is a great inspiration for programmers like me. It reminds me the endless possibilities of creativity ! Forever amazing...! :) It was only very recently, that I got hooked up with fractals and I've been playing with the Mandelbrot a lot, and I'm grateful, I was able to learn a lot of cool stuff (like OpenGL) !!! — Vivekanand V, Mar 21 '21 at 16:10
@VivekanandV I do not code on Arduino framework ... I use AVR32 and C/C++ instead. to be frank I hate Arduino ... but most MCU people are using it as its cheap ... at the cost of speed and many other things they do not even know the not use ... as Arduino is SW emulated mostly (I mean interfaces, busses and stuff) — Spektre, Mar 21 '21 at 16:12
I haven't used Arduino either! Mostly, all the stuff I do is desktop/mobile related. But, someday I hope to use it (for fun) ! — Vivekanand V, Mar 21 '21 at 16:16
@VivekanandV Arduino is just framework ... you can use the HW with AVR Studio, Atmel studio or any other tool compatible with the chips used ... which are usually Atmel AVR8 ... sometimes AVR32 — Spektre, Mar 21 '21 at 16:17
@VivekanandV but fun it is :) have made full 3D engine with gourard shading , and async TTS before I added the Mandelbrot to properly test the SSD1306 I2C comunication as it has nasty HW incompatibility bugs on both LCD and MCU side I managed to workaround it by circuitry ... and still testing. But its really cool to see 3D models spining with speaking in parallel on the little MCU thing :) without FPU ... — Spektre, Mar 21 '21 at 16:22
Sorry to ask this late... I have been researching a lot on floating point and stuff like that. But still I find it difficult to grasp why ```x = x0 + x1 + x2 ``` can be used to represent ```x``` and why the different indexed ```x``` terms hold different range of values. How is the mantissa getting expanded using this technique!? I think this is amazing! I would love to visualize deep into the bit magic that makes this possible! :) Can you please explain it to me! — Vivekanand V, Mar 28 '21 at 22:58
I understood the basic operations like addition sub, and multiply. It works like polynomials. I see that to do division you need to reciprocate then multiply. But, one thing I still find less clear is how was the value ```1e-5``` chosen? Also, what is ```fnor``` function trying to do when summing up the values in the coordinates, based on comparisons with ```1e-05``` ? Sorry, for asking, I'm new to such amazing tricks with floating point. Therefore, I am confused about this. Also, are there other methods to achieve our goal of greater precision? — Vivekanand V, Mar 30 '21 at 01:14
@VivekanandV float has exponent (`2^-128 ... 2^+127`) which translates to ~(`10^-38 ... 10^+38`) so to evenly divide the usable range into 3 values we could use `(38-(-38))/3 = ~25` decadic exponent as range so `<=1e(-38+25)` , `<=1e-38+50` however its pointless to have range of exponents that our used numbers will never reach so you should divide the range your numbers can be ... I chose `10^-5` blindly around zero blindly better choice for Mandelbrot would be the 10^-7 example I think. When you adding `x = x0+x1` the result can get out of desired range for `x` — Spektre, Mar 30 '21 at 06:27
@VivekanandV so `fnor` distribute the stuff to `y` and does the same for `y` ... so after `fnor` each part (`x,y,z`) is either zero or in their desired ranges. That way you know `x` are smallest numbers, `y` are medium and `z` are big so you can easily avoid adding small and big numbers and doing all operations with as least rounding as possible. You can select the ranges at will... Sometimes this trick is used: `x=float(q); `y=q-x;` where `x,y` are `float`s and `q` is `double` which is kind of similar to `fnor` ... — Spektre, Mar 30 '21 at 06:30
I'm using OpenGL ES for rendering the mandelbrot, therefore I don't have access to the ```double``` data type. If I had at least a ```double``` I would have had a lot more detail :( I'm thinking about increasing the precision, by using techniques like designing an arbitrary precision float. But the only thing that is stopping me from doing that is that it's going to be very slow, and I fear that all my attempt to make the rendering faster will be in vain! But anyways, I'm gonna keep on trying! :) — Vivekanand V, Mar 30 '21 at 10:30
@VivekanandV yes `3*float` , even `2*float` is much much slower than native `double` and `double` is ~ twice slower than `float` at least on my GPUs ... The `float=float(double(x)) ... ` trick I mentioned only to show you another idea for the `fnor` as you commented you still not get it... it can be implemented in more ways. btw right now I am in process of porting my GLSL Mandlebrot back to CPU side using the acceleration I posted in here (its done I just found out some inconsistencies in coordinate system in my GLSL version) once finished I post both demos in here for crossmatching. — Spektre, Mar 30 '21 at 10:35
@VivekanandV added link ... its two executables beware the CPU one rendering ~12sec for selected zoom and location and resolution on my machine... (no threads, win32) The GPU version finish around 123ms ... however it looks like the CPU one is slightly better (less round errors) but it might be just my imagination what is better ... you can look at the shader files one of them is using the dvec3 as high precision you can convert that to vec3 and floats to test ... Also the cpu version has much slower zoom change factor (without shift) to take advantage of the previous frame ... — Spektre, Mar 30 '21 at 11:49
I am on an Android device, I cannot access my windows machine, temporarily, because of personal reasons. Why don't you share a screen recorded video link/ GIF image, demonstrating your application...? — Vivekanand V, Mar 31 '21 at 02:37
Amazing!!! :D Speechless! :) I really love the bluish aura around the Mandelbrot, as well as the greenish branches. I think there is no other thing that is aesthetically pleasing than the Mandelbrot set! :) — Vivekanand V, Mar 31 '21 at 10:02
@VivekanandV that is simply combination of 8bit fixed point fractional escape and remap all used iteration counts linearly into this gradient of mine: [RGB values of visible spectrum](https://stackoverflow.com/a/22681410/2521214) that is why the colors shift slightly once the max iteration count is changed. The colors are nicer in real ... as the GIF capture code of mine dithers it to 8bpp colors — Spektre, Mar 31 '21 at 12:11
Cool. Also, while closely analyzing your images, I feel the GPU version is **slightly** more pixelated than your CPU version. But still, I prefer the GPU version, because GPU version will be more better for smooth control and efficiency! — Vivekanand V, Mar 31 '21 at 12:17
@VivekanandV Yes GPU is pixelated more slightly. The pixelation starts occuring somewhere between 10^13 and 10^14 zooms and I found out its most likely not related to the iterations itself more like to computation of the `C` (position on screen) inside HW interpolators of the GPU (stage between Vertex/Geometry and Fragment shaders). As when I implemented higher precision on GPU the pixelation did not change a bit.I want to try use high precision on CPU side in hope to veriffy my susspicions ... but also it might be just some bug in my implementation. Yes GPU is smooth.. — Spektre, Mar 31 '21 at 12:29
@VivekanandV. However CPU with incremental zoom using the above techniques is not that slow as it reuses large portion of last frame in most transitions ... — Spektre, Mar 31 '21 at 12:30
To balance, both speed and precision, I am planning to use a ```vec2``` to implement ```double``` . Is this the proper method to compare two emulated doubles: ```bool LEQ(vec2 a, vec2 b) { if (a.y > b.y) return false; if (a.x > b.x) return false; return true; }``` when ```a``` and ```b``` are normalized. Also, then my ```fnor``` would become: ```vec2 fnor(vec2 x) { vec2 c = x; if(abs(c.x)> 1e-5){c.y += c.x; c.x = 0.0; } return c; } ```.Is this correct ? Also, on the C++ application side, how can I convert an ordinary```double```, to an emulated double in GLSL, in prop. ranges? — Vivekanand V, Apr 05 '21 at 01:28
@VivekanandV yes but its not safe as the `x,y` components can have different sign ... much safer is to compute `c=a-b; return (c.x+c.y)<=0;` ... that is how `cmp` instructions/operations work anyway ... — Spektre, Apr 05 '21 at 07:06
Okay, while doing some googling, I found a blog dedicated to this stuff alone, that was written in 2011. Have a look: https://blog.cyclemap.link/2011-06-09-glsl-part2-emu/ .There, the author is implementing emulated double, but, the way the addition , subtraction and multiplications are implemented is extremely weird because of carry propagation calculations. To be honest, I did not understand that part at all ! :( . In the multiplication code, he is splitting using ```8193``` which I found is 2**13 +1 . He also does many other carry related stuff... But sadly, I can't comprehend! :( — Vivekanand V, Apr 05 '21 at 07:17
@VivekanandV interesting maybe I will give it a shot... btw I think I understand the addition ... `e` alone is not the carry !!! its just half of it ... if you do `c.x = a.x+b.x` the result is rounded and the difference is the carry so `cy=-(c.x-a.x-a.y)` In the code its just the stuff divided into 3 variables `e,t1,t2` — Spektre, Apr 05 '21 at 07:47
If I I'm using ```vec2``` to represent a 48 bit precision ```float```, then since float exponent ranges from -38 to 38, I should divide, ```(38 -(-38))/2``` which gives me, ```38```. So threshold of my ```fnor``` should be ```1```. Is that correct ? From CPU world, how would I transport a ```double``` or ```float``` to GPU world using this construct ? — Vivekanand V, Apr 07 '21 at 13:41
Okay, so, on the CPU side, should I implement an ```fnor``` to get the corresponding ```vec2``` components and then pass them to shader... Also, did the 2*float arithmetic work as expected for you? — Vivekanand V, Apr 07 '21 at 19:50
@VivekanandV In theory yes but in practice I see no difference in accuracy between `float` and `n*float` in passing data from vertex to fragment ... so either different mechanism is required for the interpolators (like change vertex coordinate 1.0 into (0.9999 + 0.0001) or something like that. Also having ranges cover full range of float will not boost the fragment accuracy a lot its better to have the ranges cover usable exponents for the equation iterated. I want to give a try to the other approach (your other question) will let you know there how it goes once multiplication is resolved. — Spektre, Apr 08 '21 at 06:54
I was successful in increasing the precision of the zoom upto ```4x``` using ```vec2``` compared to native float. But after that, I get, what I call "broken glass" pixelation... The only downside is that as we discussed before, my speed is reduced, and I've to halve the number of iterations, to get back my previous speed! Thankyou for your answer:) . But, there are still some conceptual stuff, that I personally need to wrap my mind around... What started out for me, as a simple rendering of the mandelbrot is taking me to places that I've never dreamed possible! — Vivekanand V, Apr 09 '21 at 06:11

score 2 · Answer 2 · answered Mar 19 '21 at 17:29

The optimized escape algorithm should be fast enough to draw the Mandelbrot set in real time. You can use multiple threads so that your implementation will be faster (this is very easy using OpenMP for example). You can also manually vectorize your code using SIMD instructions to make it even faster if needed. You could even run this directly on the GPU using either shaders and/or GPU computing frameworks (OpenCL or CUDA) if this is still not fast enough to you (although this is a bit complex to do efficiently). Finally you should tune the number of iterations so it is rather small.

Zooming should not have any direct impact on the performance. It just changes the input window of the computation. However, it does have an indirect impact since the actual number of iterations will change. Points outside the window should not be computed.

Double precision should also be enough for drawing Mandelbrot set correctly. But if you really want more precise calculation, you can use double-double precision which gives a quite good precision and not too bad performance. However, implementing double-double precision manually is a bit tricky and it is still significantly slower than using just double precision.

score 2 · Answer 3 · answered Mar 19 '21 at 19:35

My fastest solutions avoid iterating over large areas of the same depth by following a contour boundary and filling. There is a penalty that it is possible to nip off small buds instead of going around them, but all-in-all a small price to pay for a quick zoom.

One possible efficiency is that if a zoom doubles the scale, you already have ¼ of the points.

For animation, I file each frame's values, doubling the scale each time, and interpolate the in-between frames on playback in real time, so the animation doubles once per second. The double type allows more than 50 key frames to be stored, giving an animation that lasts more than a minute (in and then back out).

The actual iteration is done by hand-crafted assembler, so one pixel is iterated entirely in the FPU.

I use the 'escape time' algorithm. I've tried using 'distance from Mandelbrot set' but can't get better than the three main ideas I mention: a) re-use what you know, b) avoid iterating where possible, c) in those billions of iterations, every last clock cycle counts. I don't know a non-iterative approach. — Weather Vane, Mar 20 '21 at 08:47

What are the fastest algorithms for rendering the mandelbrot set?

3 Answers3

Linked