How to digitize (extract data from) a heat map image using Python?

Question

There are several packages available to digitize the line graphs e.g. GetData Graph Digitizer.
However, for digitzation of heat maps I could not find any packages or programs.

I want to digitize the heat map (images from png or jpg format) using Python. How to do it?
Do I need to write the entire code from scratch?
Or there are any packages available?

You want to digitize it *from what format*? Do you have it as a paper image or what happened to it? I can see it on my screen, I'm pretty sure my screen is not made from paper? — Andrey Tyukin, Mar 25 '18 at 00:56
What's the problem with png or jpg images? Those are both digital formats. Do you want to get the numbers back from the colors or what? — Andrey Tyukin, Mar 25 '18 at 01:00
Apologies. Yes. I need to get the data of the plot. Will edit the question. — Neeraj Hanumante, Mar 25 '18 at 01:02
Yes, editing your question is a good idea, it seems completely unclear as it is now. You probably should remove the `digitization` tag, it doesn't seem to have anything to do with the problem. — Andrey Tyukin, Mar 25 '18 at 01:04
For digitization of heatmap, You can refer to [this answer](https://stackoverflow.com/questions/30961464/saving-heatmap-using-pylab) — Arjun Sehajpal, Mar 25 '18 at 01:10

marianstefi20 · Accepted Answer · 2018-11-06T06:45:53.623

There are multiple ways to do it, many Machine Learning libraries offering custom visualization functions...easier or harder.

You need to split the problem in half.

First, using OpenCV for python or scikit-image you first have to load the images as matrices. You can set some offsets to start right at the beginning of the cells.

import cv2    
# 1 - read color image (3 color channels)
image = cv2.imread('test.jpg',1)

Then, you will iterate thru the cells and read the color inside. You can normalise the result if you want. The reason we're introducing some offsets is because the heatmap doesn't start in the top left corner of the original image at (0,0). The offset_x and offset_y will be lists with 2 values each.

offset_x[0]: the offset from the left part of the image up to the beginning of the heatmap(i.e. start_of_heatmap_x)
offset_x[1]: the offset from the right part of the image up to the ending of the heatmap(i.e. image_width - end_of_heatmap_x)
offset_y[0]: the offset from the top part of the image up to the beggining of the heatmap(i.e. start_of_heatmap_y)
offset_y[1]: the offset from the bottom part of the image up to the ending of the heatmap (i.e. image_height - end_of_heatmap_y)

Also, we don't iterate up to the last column. That's because we start from the "0-th" column and we add cell_size/2 on each base local coordinates to obtain the center value of the cell.

def read_as_digital(image, cell_size, offset_x, offset_y):
    # grab the image dimensions
    h = image.shape[0]
    w = image.shape[1]
    results = []
    # loop over the image, cell by cell 
    for y in range(offset_y[0], h-offset_y[1]-cell_size, cell_size):
       row = []
       for x in range(offset_x[0], w-offset_x[0]-cell_size, cell_size):
            # append heatmap cell color to row
            row.append(image[x+int(cell_size/2),y+int(cell_size/2)])
       results.append(row)

    # return the thresholded image
    return results

Extracting the legend information is not hard because we can derive the values by having the limits (although this applies for linear scales).

So for example, we can derive the step on the legends (from x and y).

def generate_legend(length, offset, cell_size, legend_start, legend_end):
    nr_of_cells = (length- offset[0] - offset[1])/cell_size
    step_size = (legend_end - legend_start)/nr_of_cells
    i=legend_start+step_size/2  # a little offset to center on the cell

    values = []
    while(i<legend_end):
        values.append(i)
        i = i+step_size
    return values

Then you want to visualize them to see if everything was done right. For example, with seaborn it's very easy [1]. If you want more control, over...anything, you can use scikit learn and matplotlib [2].

Even though it now attempts to answer the right question, it still seems rather vague. I mean, like, *really vague*. — Andrey Tyukin, Mar 25 '18 at 01:08
@marianstefi20 It looks like a more or less viable approach now, but extracting the values from the color-bar in the legend could be painful... — Andrey Tyukin, Mar 25 '18 at 01:20
@Andrey Tyukin I've added a function to generate the legend values. — marianstefi20, Mar 25 '18 at 01:42
Definitely looks like there is some progress there, but the API could be a bit more polished. Maybe having some method that accepts just the image, number of cells, color-range, and the position of the bar, and returns a table, would be nice. Just for your information: even though I have originally downvoted because you answered the wrong question by a link, I have changed my opinion, and am now neutral to slightly optimistic. I have retracted my vote, and I'm now just offering hopefully helpful suggestions. Maybe @DyZ has further helpful suggestions and/or changes in opinion? — Andrey Tyukin, Mar 25 '18 at 01:55

How to digitize (extract data from) a heat map image using Python?

1 Answers1

Linked