These problems are hard. Even humans will make errors.
Eg the example you provided has 13 blocks, not 12 as I see it. You missed the block between the legs just above 11.
If I am wrong I would argue why is the black (12) of the cup counted it could also be the back of the cat (10).
Flood fill.
A flood fill algorithm can solve the problem. This answer has a simple flood fill algorithm written in JavaScript and using the Canvas 2D API. To use the image must not taint the canvas (same origin or appropriate CORS headers)
Note you will also need to have a fill threshold if the image is anti-aliased or was encoded as jpeg (or other lossy compression)
Note this will only work for images with a few flat colors. Images containing gradients, or shapes that are counted as one but have many colors (due to shadows, lighting, highlights, reflections, etc..) can not be counted using this method.
To count blocks
Rather than fill with a color, fill with alpha = 0 (Transparent).
Steps
Let block count represent number of blocks. Set to 0
Start at the top left most pixel.
Repeat following steps until you have reached bottom right most pixel
Start search
If the pixel is not transparent
Apply the flood fill at that pixel
Add 1 to block count
Repeat from start search
If the pixel is transparent
move right one pixel, if past right edge move down one and start at left
Repeat from start search
Once you have competed the steps you will have the number of separate items in the image.
The flood fill algorithm can also easily give you the area of a block, (count the number of pixels filled), give you the size (width, height) and location (top, left, right bottom) of each block.
The only problem will be image noise (due to anti aliasing and compression artifacts). This would give you many small disconnected blocks along color edges. Use the number of pixels in the fill to ignore fills with less than a 100 or so pixels. In the image you provided the smallest block is around 400 pixels in area.