0

I have to finally seek help. I am so stuck after trying all options.

Now I have data which was sampled from only 10 villages from a region which has atleast 105 villages. With this sampled data of 10 villages I did prediction which also worked well and my final table with predicted values looks like this(Unfortunately I am unable to convert this table to something that can be shared here): enter image description here

Now my problem is on interpolation . I wanted to interpolate this data to overlay on other unsampled villages and this is how I did it:

from scipy.interpolate import griddata

# Extract the longitude, latitude, and prediction columns from the decoded dataframe
interpolation_data = decoded_df[['longitude', 'latitude', 'prediction']]

# Remove any rows with missing values
interpolation_data = interpolation_data.dropna()

# Convert the data to numpy arrays
points = interpolation_data[['longitude', 'latitude']].values
values = interpolation_data['prediction'].values

# Define the grid points for interpolation
grid_points = np.vstack((grid_lon.flatten(), grid_lat.flatten())).T

# Perform IDW interpolation
interpolated_values = griddata(points, values, grid_points, method='linear')

interpolated_values = interpolated_values.reshape(grid_lon.shape)

# Create a contour plot of the interpolated predictions
plt.contourf(grid_lon, grid_lat, interpolated_values)
plt.colorbar()
plt.scatter(decoded_df['longitude'], decoded_df['latitude'], c=decoded_df['prediction'], cmap='viridis', edgecolors='black')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Interpolated Predictions')
plt.show()

Now this gave me this

enter image description here

Now the next step was to overlay the interpolated results to the map of that region. which I did this way:

import geopandas as gpd
from mpl_toolkits.axes_grid1 import make_axes_locatable

import geopandas as gpd
import matplotlib.pyplot as plt

# Read shapefile data of Babati
shapefile_path = "Babati Villages/Babati_villages.shp"  # Replace with the actual path to the shapefile
gdf_babati = gpd.read_file(shapefile_path)

gdf_bti= gdf_babati[gdf_babati["District_N"] == "Babati"]
gdf_bti.head()

# Define the grid points for interpolation
grid_points = np.vstack((grid_lon.flatten(), grid_lat.flatten())).T

# Perform IDW interpolation
interpolated_values = griddata(points, values, grid_points, method='linear')

# Reshape the interpolated values to match the grid shape
interpolated_values = interpolated_values.reshape(grid_lon.shape)

from shapely.geometry import box
# Create a bounding box geometry of the Babati region
bbox = box(gdf_bti.total_bounds[0], gdf_bti.total_bounds[1],
           gdf_bti.total_bounds[2], gdf_bti.total_bounds[3])

# Clip the interpolated predictions to the extent of the Babati region
interpolated_predictions = gpd.clip(interpolated_predictions, bbox)

# Create subplots
fig, ax = plt.subplots(figsize=(10, 10))

# Plot the shapefile of the Babati region
gdf_bti.plot(ax=ax, facecolor='none', edgecolor='black')

# Plot the interpolated predictions
interpolated_predictions.plot(ax=ax, column='prediction', cmap='viridis', markersize=30, legend=True)

# Add colorbar
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)
interpolated_predictions.plot(ax=cax, column='prediction', cmap='viridis', legend=True, cax=cax)

# Set plot title and labels
ax.set_title('Interpolated Predictions in Babati Region')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')

# Show the plot
plt.show()

Here is now where the problem is because the overlay of interpolated values is totally off. I expect it to cover all interpolated villages but its not. this is what I get:

enter image description here

What am I doing wrong and any idea on how to fix this?

LivingstoneM
  • 1,088
  • 10
  • 28
  • Please provide a [minimum working example](https://stackoverflow.com/help/minimal-reproducible-example) that we can run and test. – jared Jun 16 '23 at 04:29
  • Another helpful link related to possible ways to share a DataFrame: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – R_D Aug 18 '23 at 13:03

1 Answers1

0

Disclaimer: Being no expert in geography myself but only used to drawing maps in Python, the following might include some (hopefully!) minor mistakes I'd be glad to correct.

I highly suspect your problem comes from not managing CRS.

Indeed, the coordinates in your original dataset are expressed in terms of longitude and latitude, meaning the CRS here is a geographic coordinate system. In short, Earth is being approximated by a sphere or an ellipsoid and the coordinates give a position on its surface.

Your shapefile, however, is defined in a projected coordinate system, so the position of each point is expressed with an (x, y) tuple in a two-dimensional plane. As implied by its name, this type of CRS defines the process of mapping every point on the surface of a sphere/ellipsoid to a plane (e.g. the projection), and geographers came up with plenty of ways to do just that across history.

How are you supposed to deal with CRS?

You must first find out in which CRS both your dataset and your shapefile are expressed. It is very often expressed as an EPSG code, and you should find this information where you found your dataset/shapefile. It might also already be included in the file you used, in which case it might already be present in your GeoDataFrame and you can find it using:

>>> gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

If it's not already there, you need to set it:

# set it in place...
gdf.set_crs(my_crs, inplace=True)

# ... or not
gdf = gdf.set_crs(my_crs)

Once the CRS is properly defined, you only need to decide in which target CRS you want to draw your map, and change the CRS of both your dataset and your shapefile to this target CRS. Luckily, geopandas knows how to do that:

# set it in place...
gdf.to_crs(target_crs, inplace=True)

# ... or not
gdf = gdf.to_crs(target_crs)

Now that both your dataset and your shapefile share the same CRS, your plot should be fine!

Last note: There is a reasonable chance that your dataset is using EPSG:4326 and your shapefile is using EPSG:3857.

R_D
  • 430
  • 1
  • 7