Multithreaded & SIMD vectorized Mandelbrot in R using Rcpp & OpenMP

Question

As an OpenMP & Rcpp performance test I wanted to check how fast I could calculate the Mandelbrot set in R using the most straightforward and simple Rcpp+OpenMP implementation. Currently what I did was:

#include <Rcpp.h>
#include <omp.h>
// [[Rcpp::plugins(openmp)]]

using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericMatrix mandelRcpp(const double x_min, const double x_max, const double y_min, const double y_max,
                         const int res_x, const int res_y, const int nb_iter) {
  Rcpp::NumericMatrix ret(res_x, res_y);
  double x_step = (x_max - x_min) / res_x;
  double y_step = (y_max - y_min) / res_y;
  int r,c;
#pragma omp parallel for default(shared) private(c) schedule(dynamic,1) collapse(2)
  for (r = 0; r < res_y; r++) {
    for (c = 0; c < res_x; c++) {
      double zx = 0.0, zy = 0.0, new_zx;
      double cx = x_min + c*x_step, cy = y_min + r*y_step;
      int n = 0;
      for (n=0;  (zx*zx + zy*zy < 4.0 ) && ( n < nb_iter ); n++ ) {
        new_zx = zx*zx - zy*zy + cx;
        zy = 2.0*zx*zy + cy;
        zx = new_zx;
      }
      ret(c,r) = n;
    }
  }
  return ret;
}

And then in R:

library(Rcpp)
sourceCpp("mandelRcpp.cpp")
xlims=c(-0.74877,-0.74872);
ylims=c(0.065053,0.065103);
x_res=y_res=1080L; nb_iter=10000L;
system.time(m <- mandelRcpp(xlims[[1]], xlims[[2]], ylims[[1]], ylims[[2]], x_res, y_res, nb_iter)) 
# 0.92s
rainbow=c(rgb(0.47,0.11,0.53),rgb(0.27,0.18,0.73),rgb(0.25,0.39,0.81),rgb(0.30,0.57,0.75),rgb(0.39,0.67,0.60),rgb(0.51,0.73,0.44),rgb(0.67,0.74,0.32),rgb(0.81,0.71,0.26),rgb(0.89,0.60,0.22),rgb(0.89,0.39,0.18),rgb(0.86,0.13,0.13))
    cols=c(colorRampPalette(rainbow)(100),rev(colorRampPalette(rainbow)(100)),"black") # palette
par(mar=c(0, 0, 0, 0))
system.time(image(m^(1/7), col=cols, asp=diff(ylims)/diff(xlims), axes=F, useRaster=T)) 
# 0.5s

I was unsure though if there is any other obvious speed improvements I could take advantage of aside from OpenMP multithreading, e.g. via simd vectorization? (using simd options in the openmp #pragma didn't seem to do anything)

PS at first my code was crashing but I later found this was solved by replacing ret[r,c] = n; with ret(r,c) = n; Using Armadillo classes as suggested in the answer below make things very slightly faster, though the timings are almost the same. Also flipped around x and y so it comes out in the right orientation when plotted with image(). Using 8 threads speed is ca. 350 times faster than the vectorized plain R Mandelbrot version here and also about 7.3 times faster than the (non-multithreaded) Python/Numba version here (similar to PyCUDA or PyOpenCL speeds), so quite happy with that... Rasterizing/display now seems the bottleneck in R....

Generally, I made speed gains (C with assembler iteration) by avoiding iteration within same-contour areas, and on the M-Set. Away from the M-Set boundary, large areas are contained within a contour, and I developed a curve-stitching method to follow a contour boundary, which would then be filled. The deeper the iteration, the better the gain. There can be a penalty when a bud is snipped off accidentally, and I don't see how this approach would work when using threads. Another saving can be found when following a doubling zoom, where 1/4 of the points are already known. — Weather Vane, Jan 03 '18 at 19:13
Yes but on the other hand I was planning to moving to continuous colouring, in which the first type of optimization would not be so straightforward anymore. Reusing pixels already calculated when zooming I was planning to do... At high zooms strategies like this, https://en.wikipedia.org/wiki/Mandelbrot_set#Perturbation_theory_and_series_approximation, can greatly benefit performance. But my main question was more centered on my Rcpp code as such, and less on the further algorithmic optimizations one could do, which are numerous of course.... And in R the main bottleneck seems just the display — Tom Wenseleers, Jan 03 '18 at 20:22
I never filled contour areas with colour, only with iterations. Colouring algorithms are another matter. — Weather Vane, Jan 03 '18 at 20:34
Well not really because one then no longer uses the simple escape time algo, and one no gets continuous numbers back as opposed to fixed nrs of iterations, as explained in https://en.wikipedia.org/wiki/Mandelbrot_set#Continuous_(smooth)_coloring — Tom Wenseleers, Jan 03 '18 at 20:36
See here for a Python code example: https://www.ibm.com/developerworks/community/blogs/jfp/entry/My_Christmas_Gift?lang=en of the two approaches... — Tom Wenseleers, Jan 03 '18 at 20:39
Thanks for your responses. It's been a while but maybe I should study your links and revisit my screensaver. About colouring smooth contours: the eye spots even small changes in colour. About colouring close to the M-Set boundary: I tried many different colouring algorithms based on the *difference* with neighbours to pick out a clear detail. About a playback: I interpolated in-between frames from key images stored, and a different colouring algorithm is needed, to filter out pixel flicker near the boundary. — Weather Vane, Jan 03 '18 at 20:49
You can get a big improvement by using SIMD. You calculate multiple pixels in parallel using SSE2 or AVX2 or AVX512. Are you interested in this approach? — Z boson, Jan 04 '18 at 09:37
Yes I'd love to see how to get that to work in my code! (If it can be done without using inline assembly :-) ) — Tom Wenseleers, Jan 04 '18 at 09:39

coatless · Accepted Answer · 2018-01-03T02:13:47.383

6

Do not use OpenMP with Rcpp's *Vector or *Matrix objects as they mask SEXP functions / memory allocations that are single-threaded. OpenMP is a multi-threaded approach.

This is why the code is crashing.

One way to get around this limitation is to use a non-R data structure to store the results. One of the following will be sufficient: arma::mat or Eigen::MatrixXd or std::vector<T>... As I favor armadillo, I will change the res matrix to arma::mat from Rcpp::NumericMatrix. Thus, the following will execute your code in parallel:

#include <RcppArmadillo.h> // Note the changed include and new attribute
// [[Rcpp::depends(RcppArmadillo)]]

// Avoid including header if openmp not on system
#ifdef _OPENMP
#include <omp.h>
#endif
// [[Rcpp::plugins(openmp)]]

// Note the changed return type
// [[Rcpp::export]]
arma::mat mandelRcpp(const double x_min, const double x_max,
                     const double y_min, const double y_max,
                     const int res_x, const int res_y, const int nb_iter) {
  arma::mat ret(res_x, res_y); // note change
  double x_step = (x_max - x_min) / res_x;
  double y_step = (y_max - y_min) / res_y;
  unsigned r,c;

  #pragma omp parallel for shared(res)
  for (r = 0; r < res_y; r++) {
    for (c = 0; c < res_x; c++) {
      double zx = 0.0, zy = 0.0, new_zx;
      double cx = x_min + c*x_step, cy = y_min + r*y_step;
      unsigned n = 0;
      for (;  (zx*zx + zy*zy < 4.0 ) && ( n < nb_iter ); n++ ) {
        new_zx = zx*zx - zy*zy + cx;
        zy = 2.0*zx*zy + cy;
        zx = new_zx;
      }

      if(n == nb_iter) {
        n = 0;
      }

      ret(r, c) = n;
    }
  }

  return ret;
}

With the test code (note y and x were not defined, thus I assumed y = ylims and x = xlims) we have:

xlims = ylims = c(-2.0, 2.0)

x_res = y_res = 400L
nb_iter = 256L

system.time(m <-
              mandelRcpp(xlims[[1]], xlims[[2]],
                         ylims[[1]], ylims[[2]], 
                         x_res, y_res, nb_iter))

rainbow = c(
  rgb(0.47, 0.11, 0.53),
  rgb(0.27, 0.18, 0.73),
  rgb(0.25, 0.39, 0.81),
  rgb(0.30, 0.57, 0.75),
  rgb(0.39, 0.67, 0.60),
  rgb(0.51, 0.73, 0.44),
  rgb(0.67, 0.74, 0.32),
  rgb(0.81, 0.71, 0.26),
  rgb(0.89, 0.60, 0.22),
  rgb(0.89, 0.39, 0.18),
  rgb(0.86, 0.13, 0.13)
)

cols = c(colorRampPalette(rainbow)(100),
         rev(colorRampPalette(rainbow)(100)),
         "black") # palette
par(mar = c(0, 0, 0, 0))

image(m,
      col = cols,
      asp = diff(range(ylims)) / diff(range(xlims)),
      axes = F)

For:

edited Jan 03 '18 at 02:13

answered Jan 03 '18 at 02:04

coatless

20,011
13
69
84

Ha many thanks for that! In the meantime I found that using ret(r,c) = n; instead of ret[r,c] = n; (and adding return ret; which I had stupidly forgotten) does produce the right results - will test which is fastest though! Would you also know if I can add simd options in the #pragma ? And if I would benefit from defining more variables as private? – Tom Wenseleers Jan 03 '18 at 02:13
2

You can use the new SIMD construct on these objects. Regarding private variables, well... Those variables are private. Thus, you are explicitly creating a separate copy in the memory of each thread for each private variable. Not sure there would be a gain. – coatless Jan 03 '18 at 02:21
Ha yes I understand now - thanks for that! I played around with #pragma omp parallel for simd #pragma omp for simd #pragma omp simd but none of these seemed to help performance here... – Tom Wenseleers Jan 03 '18 at 10:32
@TomWenseleers you need to vectorize by hand. This kind of optimization is too advanced for the compiler. You have to hold the pixels that finish before others and use a mask to find when all are done and then move to the next. – Z boson Jan 04 '18 at 09:40
Sounds cool - I'd love to see that working if it can be done in a reasonably elegant way! – Tom Wenseleers Jan 04 '18 at 09:41
@TomWenseleers what compiler are you using? I can give you an elegant solution in GCC or Clang and maybe ICC not with MSVC – Z boson Jan 04 '18 at 10:48
I'm using gcc 4.9.3 – Tom Wenseleers Jan 04 '18 at 10:56
@coatless Also did an attempt to implement smooth/continuous colouring by using large escape horizon, but even though the code is almost the same as before that one crashes - it just uses log, sqrt and pow from math.h - any thoughts what I'm doing wrong? – Tom Wenseleers Jan 04 '18 at 11:01
@z-boson Would this https://gallery.rcpp.org/articles/rcppnt2-introduction/ be useful actually to use SIMD in an Rcpp context? – Tom Wenseleers Jan 04 '18 at 11:49
@TomWenseleers, I would just you vector extensions. See the table I made here https://stackoverflow.com/a/43778723/2542702 – Z boson Jan 04 '18 at 12:03
I just came across this minimal Mandelbrot calculation using SIMD instructions, using the Quickvec library - that seems like quite an elegant approach in that the code then requires only rather minimal change : https://www.andrew.cmu.edu/user/mkellogg/15-418/final.html#Performance What do you think? – Tom Wenseleers Jan 08 '18 at 11:27
@TomWenseleers, I added an answer showing how to vectorize your code with GCC's and Clang's vector extensions. – Z boson Jan 16 '18 at 14:39

score 5 · Answer 2 · edited Jan 24 '18 at 16:52

I went ahead and vectorized the OP's code using GCC's and Clang's vector extensions. Before I show how I did this let me show the performance with the following hardware:

Skylake (SKL) at 3.1 GHz with 4 cores
Knights Landing (KNL) at 1.5 GHz with 68 cores
ARMv8 Cortex-A57 arch64 (Nvidia Jetson TX1) 4 cores at ? GHz

nb_iter = 1000000
                        GCC             Clang
SKL_scalar              6m5,422s
SKL_SSE41               3m18,058s
SKL_AVX2                1m37,843s       1m39,943s
SKL_scalar_omp          0m52,237s
SKL_SSE41_omp           0m29,624s       0m31,356s
SKL_AVX2_omp            0m14,156s       0m16,783s

ARM_scalar              15m28.285s
ARM_vector              9m26.384s
ARM_scalar_omp          3m54.242s
ARM_vector_omp          2m21.780s

KNL_scalar              19m34.121s
KNL_SSE41               11m30.280s
KNL_AVX2                5m0.005s        6m39.568s
KNL_AVX512              2m40.934s       6m20.061s
KNL_scalar_omp          0m9.108s
KNL_SSE41_omp           0m6.666s        0m6.992s
KNL_AVX2_omp            0m2.973s        0m3.988s
KNL_AVX512_omp          0m1.761s        0m3.335s

The theoretical speed up of KNL vs. SKL is

(68 cores/4 cores)*(1.5 GHz/3.1 Ghz)*
(8 doubles per lane/4 doubles per lane) = 16.45

I went into detail about GCC's and Clang's vector extensions capabilities here. To vectorize the OP's code here are three additional vector operations that we need to define.

1. Broadcasting

For a vector v and a scalar s GCC cannot do v = s but Clang can. But I found a nice solution which works for GCC and Clang here. For example

vsi v = s - (vsi){};

2. A any() function like in OpenCL or like in R.

The best I came up with is a generic function

static bool any(vli const & x) {
  for(int i=0; i<VLI_SIZE; i++) if(x[i]) return true;
  return false;
}

Clang actually generates relatively efficient code for this using the ptest instruction (but not for AVX512) but GCC does not.

3. Compression

The calculations are done as 64-bit doubles but the result is written out as 32-bit integers. So two calculations are done using 64-bit integers and then the two calculations are compressed into one vector of 32-bit integers. I came up with a generic solution which Clang does a good job with

static vsi compress(vli const & lo, vli const & hi) {
  vsi lo2 = (vsi)lo, hi2 = (vsi)hi, z;
  for(int i=0; i<VLI_SIZE; i++) z[i+0*VLI_SIZE] = lo2[2*i];
  for(int i=0; i<VLI_SIZE; i++) z[i+1*VLI_SIZE] = hi2[2*i];
  return z;
}

The follow solution works better for GCC but is no better for Clang. But since this function is not critical I just use the generic version.

static vsi compress(vli const & low, vli const & high) {
#if defined(__clang__)
  return __builtin_shufflevector((vsi)low, (vsi)high, MASK);
#else
  return __builtin_shuffle((vsi)low, (vsi)high, (vsi){MASK});
#endif
}

These definitions don't rely on anything x86 specific and the code (defined below) compiles for ARM processors as well with GCC and Clang.

Now that these are defined here is the code

#include <string.h>
#include <inttypes.h>
#include <Rcpp.h>

using namespace Rcpp;

#ifdef _OPENMP
#include <omp.h>
#endif
// [[Rcpp::plugins(openmp)]]
// [[Rcpp::plugins(cpp14)]]

#if defined ( __AVX512F__ ) || defined ( __AVX512__ )
static const int SIMD_SIZE = 64;
#elif defined ( __AVX2__ )
static const int SIMD_SIZE = 32;
#else
static const int SIMD_SIZE = 16;
#endif

static const int VSI_SIZE = SIMD_SIZE/sizeof(int32_t);
static const int VLI_SIZE = SIMD_SIZE/sizeof(int64_t);
static const int VDF_SIZE = SIMD_SIZE/sizeof(double);

#if defined(__clang__)
typedef int32_t vsi __attribute__ ((ext_vector_type(VSI_SIZE)));
typedef int64_t vli __attribute__ ((ext_vector_type(VLI_SIZE)));
typedef double  vdf __attribute__ ((ext_vector_type(VDF_SIZE)));
#else
typedef int32_t vsi __attribute__ ((vector_size (SIMD_SIZE)));
typedef int64_t vli __attribute__ ((vector_size (SIMD_SIZE)));
typedef double  vdf __attribute__ ((vector_size (SIMD_SIZE)));
#endif

static bool any(vli const & x) {
  for(int i=0; i<VLI_SIZE; i++) if(x[i]) return true;
  return false;
}

static vsi compress(vli const & lo, vli const & hi) {
  vsi lo2 = (vsi)lo, hi2 = (vsi)hi, z;
  for(int i=0; i<VLI_SIZE; i++) z[i+0*VLI_SIZE] = lo2[2*i];
  for(int i=0; i<VLI_SIZE; i++) z[i+1*VLI_SIZE] = hi2[2*i];
  return z;
}

// [[Rcpp::export]]
IntegerVector frac(double x_min, double x_max, double y_min,  double y_max, int res_x, int res_y, int nb_iter) {
  IntegerVector out(res_x*res_y);
  vdf x_minv = x_min - (vdf){}, y_minv = y_min - (vdf){};
  vdf x_stepv = (x_max - x_min)/res_x - (vdf){}, y_stepv = (y_max - y_min)/res_y - (vdf){};
  double a[VDF_SIZE] __attribute__ ((aligned(SIMD_SIZE)));
  for(int i=0; i<VDF_SIZE; i++) a[i] = 1.0*i;
  vdf vi0 = *(vdf*)a;

  #pragma omp parallel for schedule(dynamic) collapse(2)
  for (int r = 0; r < res_y; r++) {
    for (int c = 0; c < res_x/(VSI_SIZE); c++) {
      vli nv[2] = {0 - (vli){}, 0 - (vli){}};
      for(int j=0; j<2; j++) {
        vdf c2 = 1.0*VDF_SIZE*(2*c+j) + vi0;
        vdf zx = 0.0 - (vdf){}, zy = 0.0 - (vdf){}, new_zx;
        vdf cx = x_minv + c2*x_stepv, cy = y_minv + r*y_stepv;
        vli t = -1 - (vli){};
        for (int n = 0; any(t = zx*zx + zy*zy < 4.0) && n < nb_iter; n++, nv[j] -= t) {
          new_zx = zx*zx - zy*zy + cx;
          zy = 2.0*zx*zy + cy;
          zx = new_zx;
        }
      }
      vsi sp = compress(nv[0], nv[1]);
      memcpy(&out[r*res_x + VSI_SIZE*c], (int*)&sp, SIMD_SIZE);
    }
  }
  return out;
}

The R code is almost the same as the OP's code

library(Rcpp)
sourceCpp("frac.cpp", verbose=TRUE, rebuild=TRUE)                                                                                                                                                         
xlims=c(-0.74877,-0.74872);
ylims=c(0.065053,0.065103);
x_res=y_res=1080L; nb_iter=100000L;

t = system.time(m <- frac(xlims[[1]], xlims[[2]], ylims[[1]], ylims[[2]], x_res, y_res, nb_iter))
print(t)
m2 = matrix(m, ncol = x_res)

rainbow = c(
  rgb(0.47, 0.11, 0.53),
  rgb(0.27, 0.18, 0.73),
  rgb(0.25, 0.39, 0.81),
  rgb(0.30, 0.57, 0.75),
  rgb(0.39, 0.67, 0.60),
  rgb(0.51, 0.73, 0.44),
  rgb(0.67, 0.74, 0.32),
  rgb(0.81, 0.71, 0.26),
  rgb(0.89, 0.60, 0.22),
  rgb(0.89, 0.39, 0.18),
  rgb(0.86, 0.13, 0.13)
)

cols = c(colorRampPalette(rainbow)(100),
         rev(colorRampPalette(rainbow)(100)),"black") # palette                                                                                                                  
par(mar = c(0, 0, 0, 0))
image(m2^(1/7), col=cols, asp=diff(ylims)/diff(xlims), axes=F, useRaster=T)

To compile for GCC or Clang change the file ~/.R/Makevars to

CXXFLAGS= -Wall -std=c++14 -O3 -march=native -ffp-contract=fast -fopenmp
#uncomment the following two lines for clang    
#CXX=clang-5.0
#LDFLAGS= -lomp

If you are having trouble getting OpenMP to work for Clang see this.

The code produces more or less the same image.

Thanks millions for the detailed tutorial - very instructive and elegant! I also just found https://github.com/bisqwit/cpp_parallelization_examples/blob/master/README.md https://www.youtube.com/watch?v=Pc8DfEyAxzg https://www.youtube.com/watch?v=MfEkOcMILDo https://www.youtube.com/watch?v=pCoxpKTmykA&t=232s which still has some optimizations on the algorithmic side, but much less elegant code-wise I think... — Tom Wenseleers, Jan 16 '18 at 15:54
@TomWenseleers I don't want the accepted vote. Can you give it back to the original person. — Z boson, Jan 17 '18 at 12:37
@TomWenseleers, Those links and videos are awesome! There is clearly a lot of overlap in what we have done. I actually wrote this with intrinsics a few years ago and put it into SDL for real time rending. I also wrote it for OpenCL for the GPU. Additionally, I have implemented it for double-double computations to increase the precession because even with double you quickly run out of resolution zooming in. https://stackoverflow.com/questions/30573443/optimize-for-fast-multiplication-but-slow-addition-fma-and-doubledouble — Z boson, Jan 17 '18 at 12:39
@TomWenseleers I mostly wrote this answer because I have been wanted to test the vector extensions for a while. With the exception of the `all` function and maybe slightly the `compress` function intrinsics should not be necessary for performance at least with x86 but explicit vectorization is. I test my code on ARM for the first time yesterday. I have not looked at the ARM assembly but at least I got a nice speed up even on ARM. — Z boson, Jan 17 '18 at 12:41
The GHz of my ARM system might be 1.9 GHz https://en.wikipedia.org/wiki/Tegra#Tegra_X1. I don't know if that's for all cores or just core or it even has anything like Turbo. For SKL OpenMP used 8 threads and KNL used 272 threads (68 cores times 4 threads per core). The Tegra_X1 used four threads for the for Cortex-A57 cores (the Cortex-A53 cores are not accessible by the programmer as far as I know). — Z boson, Jan 17 '18 at 13:31
Thanks for the extra comments & clarifications. Do you still have your SDL rendering code as well by any chance? Was just wondering, because there is an R wrapper to use SDL(&GL/OpenGL) from R (http://hg.dyncall.org/pub/dyncall/bindings/file/87fd9f34eaa0/R/rdyncall/demo/00Index, https://cran.r-project.org/src/contrib/Archive/rdyncall/ ) and I was curious to see if I could get real-time rendering going all from R? With double precious one indeed quickly runs out of accuracy for deep zooms - but see https://en.wikipedia.org/wiki/Mandelbrot_set#Perturbation_theory_and_series_approximation — Tom Wenseleers, Jan 17 '18 at 15:13
@TomWenseleers yes I still have the SDL code. I have code for Qt as well. I have not look at it in a few years. It's on bitbucket. I will get back to you. Thanks for the commit about perturbation theory. I have not thought about perturbation theory since studying quantum mech. — Z boson, Jan 17 '18 at 15:33
@TomWenseleers, here is the vector library I originally use (and still use often) http://www.agner.org/optimize/#vectorclass. I highly recommend that for x86. But the vector extensions I think might be good enough except for the `any` function and maybe `compress`. — Z boson, Jan 17 '18 at 15:34
@TomWenseleers your comment in the latest edit says it makes it work on Windows. What compiler are you using on Windows? Visual C++ and Intel C++ on Windows don't support GCC like vector extensions so my code should completely fail for those two compilers on Windows. — Z boson, Jan 29 '18 at 08:40
Ha using gcc 4.9.3, included with mingw-w64 v3, which is part of https://cran.r-project.org/bin/windows/Rtools/ (Rtools34.exe)! It works fine with this standard R+RStudio+Rtools windows installation at least! (But including // [[Rcpp::plugins(cpp14)]] seemed to be required in the cpp file, so I added that in your code) — Tom Wenseleers, Jan 29 '18 at 13:25
@TomWenseleers the inner loop over `j` is not efficient. If you unroll it (it's only 2 iterations) you should see a performance improvement. I originally ,did not do this because it makes the code longer and I did not think it would matter. Now that I have tested unrolling I know for a fact that it matters. Also, you should write a custom `all` function with intrinsics. If you do both of those you should see a noticeable performance improvement. — Z boson, Jan 31 '18 at 11:52
Thanks for letting me know - feel free to still update the code in your post! Main thing I would still be keen on trying would be to use perturbation theory to allow deeper zooms, as in http://www.superfractalthing.co.nf/sft_maths.pdf, and/or to move to using https://gmplib.org/ for arbitrary precision calculations (for calculating the reference point(s))... — Tom Wenseleers, Feb 02 '18 at 13:29
@TomWenseleers, sadly I could not find the SDL version but I did find the qt5 version https://bitbucket.org/zboson/fractalqt. You need qt5 and cmake. I have not made a readme. Read the file gui.cpp to see the keyboard and mouse commands. I have not looked at this code since 2014. The SDL version was better in many ways though. — Z boson, Feb 02 '18 at 14:36
@TomWenseleers, the paper by Martin you pointed me to makes some mistakes. I figure that out on my own and then found this link https://math.stackexchange.com/questions/939270/perturbation-of-mandelbrot-set-fractal — Z boson, Feb 05 '18 at 08:46
@TomWenseleers, I got the perturbation theory working last night. I still need to find a clever way to find a point which is in the set. But once I have a point in the set applying the perturbation is working. Even using float instead of double I get much better accuracy now. — Z boson, Feb 06 '18 at 08:40
Ha that's great news - can't wait to see that! Thanks also for the qt5 version! For perturbation theory based rendering I know that Kalles Fraktaler and MandelMachine are two working implementations, but neither are very didactical as the source code is really long. And I know that they use clever rendering glitch detection like http://www.fractalforums.com/announcements-and-news/pertubation-theory-glitches-improvement/ Would that algo work for you? — Tom Wenseleers, Feb 06 '18 at 09:25
@TomWenseleers, okay I found some interesting seed points. `c = -2` and `c = -1.54369`. No imaginary component. These are Preperiodic (Misiurewicz) points https://www.ibiblio.org/e-notes/MSet/cperiod.htm. Using these points I can zoom into the limits of single and double precision. It's amazing! With double it feels like it zooms forever. I can write up an answer. — Z boson, Feb 06 '18 at 12:56
@TomWenseleers, I can answer [this question](https://stackoverflow.com/q/25640013/2542702) and or [this question](https://stackoverflow.com/q/43118611/2542702). I might do that tomorrow. — Z boson, Feb 06 '18 at 12:58
@TomWenseleers, BTW, where did you get the formula for coloring your fractal `m2^(1/7)`? — Z boson, Feb 06 '18 at 13:01
Thanks that sounds really great and interesting! For the colors this was just a simple gamma colour transform to equalize the colors a bit - the best gamma coefficient to use can vary a bit though. To avoid that I switched in the end to using histogram equalization, https://en.wikipedia.org/wiki/Histogram_equalization, as that always returns a pleasing colour gradient... Using smooth shading as in http://www.fractalforums.com/fractal-exteme/smooth-shading-mandelbrot-plugin/ or https://www.ibm.com/developerworks/community/blogs/jfp/entry/My_Christmas_Gift?lang=en is also nice. — Tom Wenseleers, Feb 06 '18 at 14:50
Was wondering if you ever wrote up your work on how to use those perturbation methods to calculate the Mandelbrot set? Was just curious as I just published a small repo with an interactive Mandelbrot viewer: https://github.com/tomwenseleers/mandelExplorer — Tom Wenseleers, Dec 22 '22 at 18:25
@TomWenseleers, your Mandelbrot repo looks really cool! I did not write up something about the perturbation method. I think the link I used is broken now. Let me see if I can find a working link. — Z boson, Dec 24 '22 at 17:23
https://math.stackexchange.com/questions/939270/perturbation-of-mandelbrot-set-fractal — Z boson, Dec 24 '22 at 17:26

Multithreaded & SIMD vectorized Mandelbrot in R using Rcpp & OpenMP

2 Answers2

Linked