1

I am trying to plot a heatmap from a 2000x2000 NumPy array. I have tried every solution from this post and many others. I have tried many cmaps and interpolation combinations. This is the code that prepares the data:

def parse_cords(cord: float):
    cord = str(cord).split(".")
    h_map[int(cord[0])][int(cord[1])] += 1

df["coordinate"] is a pandas series of floats x,y coordinate. x and y are ranging from 0 to 1999.

I have decided to modify the array so that values will range from 0 to 1, but I have tested the code also without changing the range.

h_map = np.zeros((2000, 2000), dtype='int')
cords = df["coordinate"].map(lambda cord: parse_cords(cord))
maximum = float(np.max(h_map))
precent = lambda x: x/maximum
h_map = precent(h_map)

h_map looks like this:

[[0.58396242 0.08840799 0.03153833 ... 0.00285187 0.00419393 0.06324442]
 [0.09075658 0.11172622 0.01476262 ... 0.00134206 0.00687804 0.0082201 ]
 [0.02986076 0.01862104 0.03959067 ... 0.00100654 0.00134206 0.00251636]
 ...
 [0.00301963 0.00134206 0.00134206 ... 0.00100654 0.00150981 0.00553598]
 [0.00419393 0.00268411 0.00100654 ... 0.00201309 0.00402617 0.01342057]
 [0.05183694 0.00251636 0.00184533 ... 0.00301963 0.00838785 0.1016608 ]]

Now the plot:

fig, ax = plt.subplots(figsize=figsize)
ax = plt.imshow(h_map)

And result: final plot The result is always a heatmap with only a single color depending on the cmap used. Is my array just too big to be plotted like this or am I doing something wrong?

EDIT: I have added plt.colorbar() and removed scaling from 0 to 1. The plot knows the range of data (0 to 5500) but assumes that every value is equal to 0. new_plot

MDDawid1
  • 92
  • 11
  • Do you mean your heatmap always looks like the screenshot you sent ? (ie. an empty heatmap) – SpaceBurger Jun 03 '22 at 15:22
  • @SpaceBurger yes exactly – MDDawid1 Jun 03 '22 at 15:28
  • Assuming `h_map` hasn't changed when you show or export the image, the most obvious possibility to me is that your data is skewed (like 99.99% of values are between 0 and 1, but you have one data point that has value -10 or something like that). Please check your data, maybe display a colorbar to see what matplotlib deals with or plot the distribution of your data. Edit: showing `h_map.min()` should be enough in fact. – SpaceBurger Jun 03 '22 at 15:32
  • In any case, your array surely isn't too big to be plotted. Numpy would return an allocation error, or matplotlib would just take more time to plot the figure if size was an issue. – SpaceBurger Jun 03 '22 at 15:43
  • I have checked it and **min=0**, **max=1** exactly. – MDDawid1 Jun 03 '22 at 16:39
  • You may have an issue with your data type in your array (h_map) definition. Try with `dtype=np.float64`. In your current code you are using `dtype="int"` which would round down all your values when normalizing. May also be limited to 255 values if this is interpreted as int8 (which should interpret all your values modulo 256, which would all look purple based on the colorbar in your answer). – SpaceBurger Jun 07 '22 at 15:07
  • I just checked doc and it seems like `int` type will use int64 by default so maybe this is not the problem. Check if your data is skewed by plotting distribution or counting values above 2500 `nof_high_values = (hmap > 2500).sum()`. – SpaceBurger Jun 07 '22 at 15:24
  • Last possibility I can think of is the interpolation method is somehow dropping the high values. – SpaceBurger Jun 07 '22 at 15:30

1 Answers1

1

I think that is because you only provide one color channel. Therefore, plt.imshow() interprets the data as black and white image. You could either add more channels or use a different function e.g. sns.heatmap().

from seaborn import sns
Soerendip
  • 7,684
  • 15
  • 61
  • 128
  • In this example: https://stackoverflow.com/a/33282548/16075195 also only one color channel is provided and it works (i have checked it myself). – MDDawid1 Jun 03 '22 at 15:29
  • I have tried sns.heatmap() but it compiles for over 2minutes (plt.imshow() is a matter of seconds) and i killed the process. – MDDawid1 Jun 03 '22 at 15:30
  • UPDATE: `sns.heatmap()` compiled in 4 minutes but the plot is also one color. – MDDawid1 Jun 03 '22 at 16:56
  • 1
    Are you using a numpy array as input? You can try providing a color map `sns.heatmap(..., cmap='jet', vmi=..., vmax=...)` and play around with the color range `vmin` and `vmax`. It looks like there is a small number of datapoints that are significantly larger than the rest. You could log-transform your data. – Soerendip Jun 03 '22 at 17:45
  • [here](https://stackoverflow.com/questions/17201172/a-logarithmic-colorbar-in-matplotlib-scatter-plot) is a proposed solution for applying a logarithmic scale on color mapping : – SpaceBurger Jun 07 '22 at 15:35