1

I'm working with a housing dataset for my own learning purposes and I'd like to be able to overlay my plots on top of a map to provide me with a better understanding of the 'hot spots'.

My code is below:

housing = pd.read_csv('https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/housing/housing.csv')

plt.figure()
housing.plot(x='longitude', y='latitude', kind='scatter', alpha=0.4, 
             s= housing['population']/100, label='population', figsize=(10,7),
             c= 'median_house_value', cmap=plt.get_cmap('jet'), colorbar=True, zorder=5)
plt.legend()
plt.show()

The image I saved as 'California.png'

This is what I tried:

img=imread('California.png')

plt.figure()
plt.imshow(img,zorder=0)
housing.plot(x='longitude', y='latitude', kind='scatter', alpha=0.4, 
             s= housing['population']/100, label='population', figsize=(10,7),
             c= 'median_house_value', cmap=plt.get_cmap('jet'), colorbar=True, zorder=5)
plt.legend()
plt.show()

But this just gives me two plots. I've tried switching the index around to no avail.

Is there a simple way to accomplish this? Thanks.

EDIT: Using the code below by @nbeuchat:

plt.figure(figsize=(10,7))
img=imread('California.png')

plt.imshow(img,zorder=0)
ax = plt.gca()
housing.plot(x='longitude', y='latitude', kind='scatter', alpha=0.4, 
         s= housing['population']/100, label='population', ax=ax,
         c= 'median_house_value', cmap=plt.get_cmap('jet'), colorbar=True, 
         zorder=5)
plt.legend()
plt.show()

I get the following plot:

enter image description here

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
Walter U.
  • 331
  • 1
  • 7
  • 18

2 Answers2

4

OK, The question is old, but I have a different answer that may be interesting to someone...

I've been working in exactly the same issue. The code that is available at GitHub (https://github.com/ageron/handson-ml.git) does what you need (see 02_end_to_end_machine_learning_project.ipynb).

However, that code uses the California map as an image, and just draws the points on top of it. One alternative is to build a real map, and plot the points on it, without having to read the ma image. To do this, I used the code below. You will need to install cartopy, and if you also want County lines you have to draw them using the instructions from here.

In the end, the image generated was this: Housing prices in California, plotted with Cartopy

And here is the code I used:

# Trying to use a real map
import cartopy.crs as ccrs
import cartopy.feature as cfeature

plt.figure(figsize=(10,7))

# Creates the map
ca_map = plt.axes(projection=ccrs.PlateCarree())

ca_map.add_feature(cfeature.LAND)
ca_map.add_feature(cfeature.OCEAN)
ca_map.add_feature(cfeature.COASTLINE)
ca_map.add_feature(cfeature.BORDERS, linestyle=':')
ca_map.add_feature(cfeature.LAKES, alpha=0.5)
ca_map.add_feature(cfeature.RIVERS)
ca_map.add_feature(cfeature.STATES.with_scale('10m'))

# To add county lines
import cartopy.io.shapereader as shpreader

reader = shpreader.Reader('datasets/housing/countyl010g.shp')
counties = list(reader.geometries())
COUNTIES = cfeature.ShapelyFeature(counties, ccrs.PlateCarree())
ca_map.add_feature(COUNTIES, facecolor='none', edgecolor='gray')

ca_map.xaxis.set_visible(True)
ca_map.yaxis.set_visible(True)

# Plots the data onto map
plt.scatter(housing['longitude'], housing['latitude'], alpha=0.4, 
            s=housing["population"]/100, label="population",
            c=housing['median_house_value'], 
            cmap=plt.get_cmap("jet"), 
            transform=ccrs.PlateCarree())

# Colorbar
prices = housing["median_house_value"]
tick_values = np.linspace(prices.min(), prices.max(), 11)
cbar = plt.colorbar()
cbar.ax.set_yticklabels(["$%dk"%(round(v/1000)) for v in tick_values], fontsize=14)
cbar.set_label('Median House Value', fontsize=16)

# Plot labels
plt.ylabel("Latitude", fontsize=14)
plt.xlabel("Longitude", fontsize=14)
plt.legend()

save_fig("housing_prices_scatterplot_cartopy")

The advantage here is to use a real map, and this code now can be easily changed for whatever part of the world you want to use. Have fun!

3

You are creating a new figure by using the dataframe plot function. You should pass the axes on which you want to draw your second plot. One way is to use gca to get the current axis.

The following should work (not tested though):

plt.figure(figsize=(10,7))
img=imread('California.png')

plt.imshow(img,zorder=0,extent=[housing['longitude'].min(),housing['longitude'].max(),housing['latitude'].min(),housing['latitude'].max()])
ax = plt.gca()
housing.plot(x='longitude', y='latitude', kind='scatter', alpha=0.4, 
         s= housing['population']/100, label='population', ax=ax,
         c= 'median_house_value', cmap=plt.get_cmap('jet'), colorbar=True, 
         zorder=5)
plt.legend()
plt.show()

EDIT: using the extent parameter of imshow with the minimum and maximum values of your longitude and latitude data will scale the image correctly.

nbeuchat
  • 6,575
  • 5
  • 36
  • 50
  • Plot looks odd, scatterplot is small and in the corner. – Walter U. Aug 29 '17 at 20:50
  • imshow does not know what 1px means in terms of latitude/longitude. You need to explicitely set it. You can use the `extent` parameter of `imshow`. I did that a couple of weeks ago, let me check my notebooks – nbeuchat Aug 29 '17 at 20:57
  • Doing this would also have the advantage of having the correct x and y tick labels on your plot – nbeuchat Aug 29 '17 at 21:01
  • 1
    Genius! Thanks a lot, saved me a lot of headaches. – Walter U. Aug 29 '17 at 21:04