1

I have a pandas dataframe called 'result' containing Longitude, Latitude and Production values. The dataframe looks like the following. For each pair of latitude and longitude there is one production value, therefore there many NaN values.

> Latitude   0.00000   32.00057  32.00078  ...  32.92114  32.98220  33.11217
  Longitude                                ...                              
  -104.5213       NaN       NaN       NaN  ...       NaN       NaN       NaN
  -104.4745       NaN       NaN       NaN  ...       NaN       NaN       NaN
  -104.4679       NaN       NaN       NaN  ...       NaN       NaN       NaN
  -104.4678       NaN       NaN       NaN  ...       NaN       NaN       NaN
  -104.4660       NaN       NaN       NaN  ...       NaN       NaN       NaN

This is my code:

plt.rcParams['figure.figsize'] = (12.0, 10.0)
plt.rcParams['font.family'] = "serif"
plt.figure(figsize=(14,7))
plt.title('Heatmap based on ANN results')
sns.heatmap(result)

The heatmap plot looks like this

enter image description here

but I want it to look more like this

enter image description here

How to adjust my code so it looks like the one on the second image?

James Z
  • 12,209
  • 10
  • 24
  • 44
Ravan
  • 13
  • 3
  • You could try smoothing the image with a gaussian kernel, such as [scipy gauss filter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.gaussian_filter.html) – Jakob Guldberg Aaes Nov 15 '20 at 19:19
  • 1
    If you have so few data points and so many missing values, please think about if replacing all the NaNs with estimations is a good idea and how to do that estimation correctly in the context of your data. – Niklas Mertsch Nov 16 '20 at 08:51

1 Answers1

2

I made a quick and dirty example of how you can smooth data in numpy array. It should be directly applicable to pandas dataframes as well.

First I present the code, then go through it:

# Some needed packages
import numpy as np
import matplotlib.pyplot as plt
from scipy import sparse
from scipy.ndimage import gaussian_filter
np.random.seed(42)


# init an array with a lot of nans to imitate OP data
non_zero_entries = sparse.random(50, 60)
sparse_matrix = np.zeros(non_zero_entries.shape) + non_zero_entries
sparse_matrix[sparse_matrix == 0] = None

# set nans to 0
sparse_matrix[np.isnan(sparse_matrix)] = 0

# smooth the matrix
smoothed_matrix = gaussian_filter(sparse_matrix, sigma=5)

# Set 0s to None as they will be ignored when plotting
# smoothed_matrix[smoothed_matrix == 0] = None
sparse_matrix[sparse_matrix == 0] = None

# Plot the data
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2,
                               sharex=False, sharey=True,
                               figsize=(9, 4))
ax1.matshow(sparse_matrix)
ax1.set_title("Original matrix")
ax2.matshow(smoothed_matrix)
ax2.set_title("Smoothed matrix")
plt.tight_layout()
plt.show()

The code is fairly simple. You can't smooth NaN and we have to get rid of them. I set them to zero, but depending on your field you might want to interpolate them. Using the gaussian_filter we smooth the image, where sigma controls the width of the kernel.

The plot code yields the following images a random sparse matrix before and after smoothing