Plotting two variables then coloring by a third variable

Question

I have a dataset from an aircraft flight and I am trying to plot the position of the plane (longitude x latitude) then color that line by the altitude of the plan at those coordinates. My code looks like this:

lat_data = np.array( [ 39.916294, 39.87139 , 39.8005  , 39.70801 , 39.64645 , 39.58172 ,
       39.537853, 39.55141 , 39.6787  , 39.796528, 39.91702 , 40.008347,
       40.09513 , 40.144157, 40.090584, 39.96447 , 39.838924, 39.712112,
       39.597103, 39.488377, 39.499096, 39.99354 , 40.112175, 39.77281 ,
       39.641186, 39.51512 , 39.538853, 39.882736, 39.90413 , 39.811333,
       39.73279 , 39.65676 , 39.584026, 39.5484  , 39.54484 , 39.629486,
       39.96    , 40.07143 , 40.187405, 40.304718, 40.423153, 40.549305,
       40.673313, 40.794548, 40.74402 , 40.755558, 40.770306, 40.73574 ,
       40.795086, 40.774628] )

long_data = np.array( [ -105.13034 , -105.144104, -105.01132 , -104.92708 , -104.78505 ,
       -104.6449  , -104.49255 , -104.36578 , -104.32623 , -104.31285 ,
       -104.32199 , -104.41774 , -104.527435, -104.673935, -104.81152 ,
       -104.82184 , -104.81882 , -104.81314 , -104.74657 , -104.78108 ,
       -104.93442 , -104.98039 , -105.0168  , -105.04967 , -105.056564,
       -105.03639 , -105.13429 , -105.05214 , -105.17435 , -105.070526,
       -104.93587 , -104.80029 , -104.65973 , -104.50339 , -104.33972 ,
       -104.21634 , -103.96216 , -103.84808 , -103.72534 , -103.60455 ,
       -103.48926 , -103.376495, -103.25937 , -103.10858 , -103.08469 ,
       -103.24878 , -103.4169  , -103.53073 , -103.23694 , -103.41254 ] )

altitude_data = np.array( [1.6957603e+00,  1.9788861e+00,  1.8547169e+00,  1.8768315e+00,
        1.9633590e+00,  2.0504241e+00,  2.1115899e+00,  2.1085002e+00,
        1.8621666e+00,  1.8893014e+00,  1.8268168e+00,  1.7574688e+00,
        1.7666028e+00,  1.7682364e+00,  1.8120643e+00,  1.7637002e+00,
        1.8054264e+00,  1.9149075e+00,  2.0173934e+00,  2.0875392e+00,
        2.1486480e+00,  1.8622510e+00,  1.7937366e+00,  1.8748144e+00,
        1.9063262e+00,  1.9397615e+00,  2.1261981e+00,  2.0180094e+00,
        1.9827688e+00, -9.9999990e+06,  1.8933343e+00,  1.9615903e+00,
        2.1000245e+00,  2.1989927e+00,  2.3200927e+00, -9.9999990e+06,
        4.0542388e+00,  4.0591464e+00,  4.0597038e+00,  4.3395977e+00,
        4.6702847e+00,  5.0433373e+00,  5.2824092e+00,  5.2813010e+00,
        5.2735353e+00,  5.2784677e+00,  5.2784038e+00,  5.2795196e+00,
        4.9482727e+00,  4.2531524e+00] )

import matplotlib as plt    

fig, ax1 = plt.subplots( figsize = ( 10, 10 ) )
ax1.plot( long_data, lat_data, alpha = .4)
ax1.scatter( long_data, lat_data, c = altitude_data )
plt.show()

Which gives us this track: .

Is there a way to consolidate the data into one line that plots the location of the aircraft and adjusts the color for the elevation?

While plotting a line and a scatter together works, it does not look very good when I put in all the data (n = 2400 ). Thanks!

Do you mean that you don't want the line, and instead just the scatter location points and a colorbar for elevation? — Jacob K, Oct 29 '20 at 21:47
@Mr. T the -9.9999990+06 value is where the altimeter must have stopped reporting. Sorry, should've removed that data point for this example! — danrod13, Oct 30 '20 at 18:15

score 2 · Answer 1 · edited Oct 30 '20 at 00:06

So, I have something that is pretty close. there will be some missing/averaging of altitude data though.

from matplotlib import pyplot as plt
import matplotlib
import matplotlib.cm as cm
#... define arrays ...

fig, ax1 = plt.subplots( figsize = ( 10, 10 ) )
minima = min(altitude_data)
maxima = max(altitude_data)

norm = matplotlib.colors.Normalize(vmin=0, vmax=maxima, clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cm.summer)

pointsPerColor = 2

for x in range(len(lat_data)//pointsPerColor):
    startIndex = x * pointsPerColor
    stopIndex = startIndex + pointsPerColor + 1

    #get color for this section
    avgAltitude = sum(altitude_data[startIndex:stopIndex])/pointsPerColor
    rbga = mapper.to_rgba(avgAltitude)

    #plot section (leng)
    ax1.plot( long_data[startIndex:stopIndex], 
            lat_data[startIndex:stopIndex], 
            alpha=.7,color=rbga )

plt.show()

So what's happening in order is..

get min & max of your altitude & use that to make a color mapper there's several color options
determine interval. need atleast 2 points to make a line obviously
loop for (number of points)/pointsPerColor (need to do integer division) a. get average color b. plot segment with color

thats it!.. I probably could've done this a lil prettier but it works also.. those super low values messed the mapping..so I just set min to 0

line plot with color scale of altitude data

This looks great! Sorry about those low values...I should've taken them out for this example. Any ideas for how to do this without using a for loop? The full dataset is ~2400 points so any for loop will take a while. Not a big deal if there isn't, this code looks like exactly what I need! — danrod13, Oct 30 '20 at 18:34
Just ran this and it looks fantastic. Thanks again for the help! — danrod13, Oct 30 '20 at 19:50

Mr. T · Answer 2 · 2020-11-05T18:13:45.357

Update
As discussed, here now the code without a for loop and including a fourth category, e.g., acceleration. Now the code uses Line3DCollection to generate the trajectory and a custom-made color map with LinearSegmentedColormap to indicate the fourth category (acceleration):

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.art3d import Line3DCollection
from matplotlib.colors import LinearSegmentedColormap

fig = plt.figure(figsize=(12,12))
ax = fig.gca(projection='3d')

#rolling average between two acceleration data points
aver_accel = np.convolve(acceleration_data, np.ones((2,))/2, mode='valid')     

#custom colour map to visualize acceleartion and decelaration
cmap_bgr = LinearSegmentedColormap.from_list("bluegreyred", ["red", "lightgrey", "lightgrey", "blue"])

#creating the trajectory as line segments
points = np.transpose([lat_data, long_data, altitude_data])
window = (2, 3)
view_shape = (len(points) - window[0] + 1,) + window 
segments = np.lib.stride_tricks.as_strided(points, shape = view_shape, strides = (points.itemsize,) + points.strides)
trajectory = Line3DCollection(segments, cmap=cmap_bgr, linewidth=3)
#set the colour according to the acceleration data
trajectory.set_array(aver_accel)
#add line collection and plot color bar for acceleration
cb = ax.add_collection(trajectory)
cbar = plt.colorbar(cb, shrink=0.5)
cbar.set_label("acceleration", rotation=270)

#let's call it "autoscale"
ax.set_xlim(min(lat_data), max(lat_data))
ax.set_ylim(min(long_data), max(long_data))
ax.set_zlim(min(altitude_data), max(altitude_data))

ax.set_xlabel("latitude")
ax.set_ylabel("longitude")
ax.set_zlabel("altitude")

plt.show()

Sample output (with arbitrary acceleration data):

Thanks to the tailored colormap, one can clearly see acceleration and deceleration phases. Since we directly use the array, a colorbar for calibration can be easily added. Mind you, you still have the variable linewidth that also takes an array (for instance for velocity), although this will probably then be difficult to read. There is also substantial time gain in the generation of large-scale 3D line collections thanks to this marvellous answer.

For comparison, here the 2D view as produced by the other answers:

Original answer
Since you have 3D data, why not create a 3D projection? You can always move the view into a 2D projection if you feel like it. To avoid the problem that the color is defined by the first point of each line (i.e., a steep ascent would look different from a steep descent), this program determines the middle point of each line for the color-coded altitude calculation. Disadvantages: Uses a slow for loop, and the altitude colors are normalized between 0 and 1 (which doesn't matter here because altitude is overdetermined in this 3D projection but will become a problem if you want to color-code another parameter).

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm

fig = plt.figure(figsize=(10,10))
ax = fig.gca(projection='3d')

min_alt = np.min(altitude_data)
max_alt = np.max(altitude_data)
#generate normalized altitude array for colour code
#the factor 0.95 filters out the end of this colormap
cols_raw = 0.95 * (altitude_data-min_alt) / (max_alt-min_alt) 
#rolling average between two data point colors
cols = np.convolve(cols_raw, np.ones((2,))/2, mode='valid')     

for i, col in enumerate(cols):
    ax.plot(lat_data[i:i+2], long_data[i:i+2], altitude_data[i:i+2], c=cm.gnuplot(col))

ax.set_xlabel("latitude")
ax.set_ylabel("longitude")
ax.set_zlabel("altitude")

plt.show()

The sample data for the above outputs:

lat_data = np.array( [ 39.916294, 39.87139 , 39.8005  , 39.70801 , 39.64645 , 39.58172 ,
     39.537853, 39.55141 , 39.6787  , 39.796528, 39.91702 , 40.008347,
     40.09513 , 40.144157, 40.090584, 39.96447 , 39.838924, 39.712112,
     39.597103, 39.488377, 39.499096, 39.99354 , 40.112175, 39.77281 ,
     39.641186, 39.51512 , 39.538853, 39.882736, 39.90413 , 39.811333,
     39.73279 , 39.65676 , 39.584026, 39.5484  , 39.54484 , 39.629486,
     39.96    , 40.07143 , 40.187405, 40.304718, 40.423153, 40.549305,
     40.673313, 40.794548, 40.74402 , 40.755558, 40.770306, 40.73574 ,
     40.795086, 40.774628] )
  
long_data = np.array( [ -105.13034 , -105.144104, -105.01132 , -104.92708 , -104.78505 ,
       -104.6449  , -104.49255 , -104.36578 , -104.32623 , -104.31285 ,
       -104.32199 , -104.41774 , -104.527435, -104.673935, -104.81152 ,
       -104.82184 , -104.81882 , -104.81314 , -104.74657 , -104.78108 ,
       -104.93442 , -104.98039 , -105.0168  , -105.04967 , -105.056564,
       -105.03639 , -105.13429 , -105.05214 , -105.17435 , -105.070526,
       -104.93587 , -104.80029 , -104.65973 , -104.50339 , -104.33972 ,
       -104.21634 , -103.96216 , -103.84808 , -103.72534 , -103.60455 ,
       -103.48926 , -103.376495, -103.25937 , -103.10858 , -103.08469 ,
       -103.24878 , -103.4169  , -103.53073 , -103.23694 , -103.41254 ] )

altitude_data = np.array( [1.6957603e+00,  1.9788861e+00,  1.8547169e+00,  1.8768315e+00,
        1.9633590e+00,  2.0504241e+00,  2.1115899e+00,  2.1085002e+00,
        1.8621666e+00,  1.8893014e+00,  1.8268168e+00,  1.7574688e+00,
        1.7666028e+00,  1.7682364e+00,  1.8120643e+00,  1.7637002e+00,
        1.8054264e+00,  1.9149075e+00,  2.0173934e+00,  2.0875392e+00,
        2.1486480e+00,  1.8622510e+00,  1.7937366e+00,  1.8748144e+00,
        1.9063262e+00,  1.9397615e+00,  2.1261981e+00,  2.0180094e+00,
        1.9827688e+00,  1.9999990e+00,  1.8933343e+00,  1.9615903e+00,
        2.1000245e+00,  2.1989927e+00,  2.3200927e+00,  2.9999990e+00,
        4.0542388e+00,  4.0591464e+00,  4.0597038e+00,  4.3395977e+00,
        4.6702847e+00,  5.0433373e+00,  5.2824092e+00,  5.2813010e+00,
        5.2735353e+00,  5.2784677e+00,  5.2784038e+00,  5.2795196e+00,
        4.9482727e+00,  4.2531524e+00] )

acceleration_data = np.array( 
    [1,   2,   2,   3,
     3,   3,   2,   2,
     2,   2,   4,   5,
     4,   3,   4,   3,
     3,   3,   3,   4,
     3,   3,   4,   5,
     4,   4,   4,   5,
     4,   15,  26,  49,
     67,  83,  89,  72,
     77,  63,  75,  82,
     69,  37,  5,  -29,
     -37, -27, -29, -14,
     9,   4] )

This is awesome. The 3D plot looks really cool! I was planning on plotting in 2D because I will substitute the altitude data for other variables (i.e. velocity, pitch, roll, attack etc.) so the 3D model won't work as well for those variables. Your code for differentiating the colors looks like what I'm looking for. I'm guessing I will have to provide a 'long_data[i:i+2]' for each color change I want? — danrod13, Oct 30 '20 at 18:25
Regarding the color question - unfortunately, in contrast to scatter plots, line plots do not take a colour array (not sure why). There [are examples how you can use LineCollections](https://stackoverflow.com/a/44649053/8881141) to colour your lines differently that may speed up the process because they use numpy functions. — Mr. T, Oct 30 '20 at 19:02
Got it. I will look into LineCollections. Thanks for your help! Here's the [final result](https://imgur.com/rkWb5uA) for the figure. This is multiple flights so I'm going to parse out all the data so it's legible. — danrod13, Oct 30 '20 at 19:41
I added sample code using `Line3DCollection` to illustrate my point how to use it. Thanks to the help of the community, it should also perform faster with large-scale datasets like yours than the usual approaches I have seen. — Mr. T, Oct 31 '20 at 14:39

HotDogCannon · Accepted Answer · 2020-10-30T23:07:00.860

It looks like if you want to use a Line2D object, you're stuck with a single color per object. As a workaround, you could plot each line segment as a set of (first order linearly) interpolated segments and color each of those by its corresponding infinitesimal value.

It looks like this functionality is contained in a LineCollection instance, however I just went for a more quick and dirty approach below.

For extra credit, since we're talking about geospatial data here, why not use cartopy to plot your data? That way you can have a "basemap" which gives you some reference. After all, if it's worth plotting, it's worth plotting beautifully.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import cartopy
import cartopy.crs as ccrs

import numpy as np
import scipy
from scipy import interpolate

import matplotlib
#matplotlib.use('Agg')
import matplotlib.pyplot as plt

### clean data
filter_inds   = np.where(np.abs(altitude_data) < 100)
lat_data      = lat_data[filter_inds]
long_data     = long_data[filter_inds]
altitude_data = altitude_data[filter_inds]

# =============== plot

plt.close('all')
plt.style.use('dark_background') ## 'default'
fig = plt.figure(figsize=(1500/100, 1000/100))
#ax1 = plt.gca()

lon_center = np.mean(long_data); lat_center = np.mean(lat_data)

ax1 = plt.axes(projection=ccrs.Orthographic(central_longitude=lon_center, central_latitude=lat_center))
ax1.set_aspect('equal')

scale = 3 ### 'zoom' with smaller numbers
ax1.set_extent((lon_center-((0.9*scale)), lon_center+((0.7*scale)), lat_center-(0.5*scale), lat_center+(0.5*scale)), crs=ccrs.PlateCarree())

### states
ax1.add_feature(cartopy.feature.NaturalEarthFeature(category='cultural', scale='10m', facecolor='none', name='admin_1_states_provinces_shp'), zorder=2, linewidth=1.0, edgecolor='w')

ax1.add_feature(cartopy.feature.RIVERS.with_scale('10m'), zorder=2, linewidth=1.0, edgecolor='lightblue')
ax1.add_feature(cartopy.feature.LAKES.with_scale('10m'), zorder=2, linewidth=1.0, edgecolor='gray')

### download counties from https://prd-tnm.s3.amazonaws.com/StagedProducts/Small-scale/data/Boundaries/countyl010g_shp_nt00964.tar.gz
### untar with : tar -xzf countyl010g_shp_nt00964.tar.gz

try:
    reader = cartopy.io.shapereader.Reader('countyl010g.shp')
    counties = list(reader.geometries())
    COUNTIES = cartopy.feature.ShapelyFeature(counties, ccrs.PlateCarree())
    ax1.add_feature(COUNTIES, facecolor='none', alpha=0.5, zorder=2, edgecolor='gray')
except:
    pass

#norm = matplotlib.colors.Normalize(vmin=altitude_data.min(), vmax=altitude_data.max())
norm = matplotlib.colors.Normalize(vmin=1.0, vmax=6.0)
cmap = matplotlib.cm.viridis
mappableCmap = matplotlib.cm.ScalarMappable(norm=norm, cmap=cmap)

# ===== plot line segments individually for gradient effect

for i in range(long_data.size-1):
    long_data_this_segment = long_data[i:i+2]
    lat_data_this_segment  = lat_data[i:i+2]
    altitude_data_this_segment  = altitude_data[i:i+2]
    
    ### create linear interp objects
    ### scipy doesnt like when the data isn't ascending (hence the flip)
    
    try:
        spl_lon = scipy.interpolate.splrep(altitude_data_this_segment, long_data_this_segment, k=1)
        spl_lat = scipy.interpolate.splrep(altitude_data_this_segment, lat_data_this_segment,  k=1)
    except:
        long_data_this_segment = np.flip(long_data_this_segment)
        lat_data_this_segment = np.flip(lat_data_this_segment)
        altitude_data_this_segment = np.flip(altitude_data_this_segment)
        spl_lon = scipy.interpolate.splrep(altitude_data_this_segment, long_data_this_segment, k=1)
        spl_lat = scipy.interpolate.splrep(altitude_data_this_segment, lat_data_this_segment,  k=1)
    
    ### linearly resample on each segment
    nrsmpl=100
    altitude_data_this_segment_rsmpl = np.linspace(altitude_data_this_segment[0],altitude_data_this_segment[1],nrsmpl)
    long_data_this_segment_rsmpl = scipy.interpolate.splev(altitude_data_this_segment_rsmpl, spl_lon)
    lat_data_this_segment_rsmpl = scipy.interpolate.splev(altitude_data_this_segment_rsmpl, spl_lat)
    
    for j in range(long_data_this_segment_rsmpl.size-1):
        
        long_data_this_segment_2 = long_data_this_segment_rsmpl[j:j+2]
        lat_data_this_segment_2  = lat_data_this_segment_rsmpl[j:j+2]
        altitude_data_this_segment_2  = altitude_data_this_segment_rsmpl[j:j+2]
        
        ax1.plot(long_data_this_segment_2, lat_data_this_segment_2, transform=ccrs.PlateCarree(), c=mappableCmap.to_rgba(np.mean(altitude_data_this_segment_2)), zorder=3, linestyle='solid', alpha=0.8, lw=5.0)

# =====

### plot the actual data points as a scatter plot
pts = ax1.scatter(long_data, lat_data, transform=ccrs.PlateCarree(), alpha=1.0, marker='o', c=mappableCmap.to_rgba(altitude_data), edgecolor='w', zorder=4)

cbar = fig.colorbar(mappable=mappableCmap, ax=ax1, orientation='vertical', fraction=0.046, pad=0.04)
cbar.set_label(r'$Altitude$ [units]', fontsize=20)
cbar.ax.tick_params(labelsize=16)
cbar.set_ticks(np.linspace(1.0, 6.0, 5+1), update_ticks=True)
cbar.set_ticklabels([ ('%0.1f' % x) for x in cbar.get_ticks() ])

fig.tight_layout()
fig.savefig('flightPath.png',dpi=100)
plt.show()

score 1 · Answer 4 · answered Oct 29 '20 at 22:32

Here is my solution using Plotly's ScatterGeo object as well as Pandas and NumPy to load in the data. I chose this package since you could then have an interactive plot (with zoom and hover data) and also see which states the plane flew over :).

# Import packages
import pandas as pd
import numpy as np
import plotly.graph_objects as go

# Load your data into a Pandas DataFrame object
d = {'Lat': lat_data, 'Long': long_data, 'Altitude': altitude_data}
df = pd.DataFrame(data=d)

# Create scatterGeo object with the proper data 
scatterMapData = go.Scattergeo(lon = df['Long'], lat = df['Lat'], text=df['Altitude'],
                               mode = 'markers+lines', marker_color = df['Altitude'],
                               marker = dict(colorscale = 'Viridis', cmin = 0, 
                                             cmax = df['Altitude'].max(),
                                             colorbar_title = "Altitude",
                                             #line = dict(width=1, color='black')
                                            )
                               )

# Load scatterMapData object into Plotly Figure
# and configure basic options for title and scoping
fig = go.Figure(data=scatterMapData)
fig.update_layout(title = 'Plane Flight Data', geo_scope = 'usa',
                  geo = dict(scope = 'usa',
                             #projection_scale = 5,
                             center={'lat': np.median(df['Lat']), 'lon': np.median(df['Long'])})
                 )

# Finally show the plot
fig.show()

Here is a zoomed in version of the plot:

I just want to point out that you can change to mode='marker' in the scattergeo object for just a scatter plot and mode='lines' for just a line plot connecting each of the locations.

I like what you did here. It looks very clean. I was hoping to get just a line (that plots position) while coloring for the altitude instead of a scatter plot connected by a line. When I put in all ~2400 datapoints it looks pretty busy and the connecting line starts to dominate. — danrod13, Oct 30 '20 at 18:29

Plotting two variables then coloring by a third variable

4 Answers4

Linked