The reason why you are not seeing obvious performance gains from the Quadtree is because of the nature of your simulation. Currently, the default separation causes a large number of boids to "converge" to the same position.
Lots of objects in the same position is going to negate possible speedups due to spatial partitioning. If all the objects are in the same or near position, boids in the area a forced to check against all other boids in the area.
You can demonstrate to yourself that your Quadtree is working by watching or profiling your application with its default settings. Now turn separation up to maximum. You will see visually, or through profiling, that as the boids spread out more evenly, the FPS increases significantly. This is because the Quadtree can now prevent computations thanks to its spatial partitioning.
With default low separation:

With maximum separation:

You can see how performance is increased in the second image. Also note, that the conjecture by another commentor that it is the construction of the Quadtree (insert
) that is taking up all the time is wrong.
While in some applications you may be able to update a Quadtree as things move around, since in this simulation every constituent moves every frame, reconstructing the Quadtree from scratch is less work, then taking every object out and reinserting it into its new position.
The advice to skip square-rooting and just use the distance-squared is good though as this will get you a bit more performance.