I have to finally seek help. I am so stuck after trying all options.
Now I have data which was sampled from only 10 villages from a region which has atleast 105 villages. With this sampled data of 10 villages I did prediction which also worked well and my final table with predicted values looks like this(Unfortunately I am unable to convert this table to something that can be shared here):
Now my problem is on interpolation . I wanted to interpolate this data to overlay on other unsampled villages and this is how I did it:
from scipy.interpolate import griddata
# Extract the longitude, latitude, and prediction columns from the decoded dataframe
interpolation_data = decoded_df[['longitude', 'latitude', 'prediction']]
# Remove any rows with missing values
interpolation_data = interpolation_data.dropna()
# Convert the data to numpy arrays
points = interpolation_data[['longitude', 'latitude']].values
values = interpolation_data['prediction'].values
# Define the grid points for interpolation
grid_points = np.vstack((grid_lon.flatten(), grid_lat.flatten())).T
# Perform IDW interpolation
interpolated_values = griddata(points, values, grid_points, method='linear')
interpolated_values = interpolated_values.reshape(grid_lon.shape)
# Create a contour plot of the interpolated predictions
plt.contourf(grid_lon, grid_lat, interpolated_values)
plt.colorbar()
plt.scatter(decoded_df['longitude'], decoded_df['latitude'], c=decoded_df['prediction'], cmap='viridis', edgecolors='black')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Interpolated Predictions')
plt.show()
Now this gave me this
Now the next step was to overlay the interpolated results to the map of that region. which I did this way:
import geopandas as gpd
from mpl_toolkits.axes_grid1 import make_axes_locatable
import geopandas as gpd
import matplotlib.pyplot as plt
# Read shapefile data of Babati
shapefile_path = "Babati Villages/Babati_villages.shp" # Replace with the actual path to the shapefile
gdf_babati = gpd.read_file(shapefile_path)
gdf_bti= gdf_babati[gdf_babati["District_N"] == "Babati"]
gdf_bti.head()
# Define the grid points for interpolation
grid_points = np.vstack((grid_lon.flatten(), grid_lat.flatten())).T
# Perform IDW interpolation
interpolated_values = griddata(points, values, grid_points, method='linear')
# Reshape the interpolated values to match the grid shape
interpolated_values = interpolated_values.reshape(grid_lon.shape)
from shapely.geometry import box
# Create a bounding box geometry of the Babati region
bbox = box(gdf_bti.total_bounds[0], gdf_bti.total_bounds[1],
gdf_bti.total_bounds[2], gdf_bti.total_bounds[3])
# Clip the interpolated predictions to the extent of the Babati region
interpolated_predictions = gpd.clip(interpolated_predictions, bbox)
# Create subplots
fig, ax = plt.subplots(figsize=(10, 10))
# Plot the shapefile of the Babati region
gdf_bti.plot(ax=ax, facecolor='none', edgecolor='black')
# Plot the interpolated predictions
interpolated_predictions.plot(ax=ax, column='prediction', cmap='viridis', markersize=30, legend=True)
# Add colorbar
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)
interpolated_predictions.plot(ax=cax, column='prediction', cmap='viridis', legend=True, cax=cax)
# Set plot title and labels
ax.set_title('Interpolated Predictions in Babati Region')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
# Show the plot
plt.show()
Here is now where the problem is because the overlay of interpolated values is totally off. I expect it to cover all interpolated villages but its not. this is what I get:
What am I doing wrong and any idea on how to fix this?