1

I'm plotting a couple of scatter plots with a lot of data points. At some point half the plot is just solid color and you cannot see the density very well. So I want to "project" the data onto the axis and display a histogram.

I wrote a little function that does that. To a plot on axis ax it plots the fields column_x vs column_y of the pandas DataFrame frame. If one_track_frame is given, it is also plotted on top of that. To add add a title and labels etc. a lambda can be passed with the ax object as parameter.

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import pandas as pd

def projection_plot(ax, frame, column_x, column_y, frame_one_track=None, commands=None, bins=100):
    ax.scatter(frame[column_x], frame[column_y], label="one track", marker='x')

    divider = make_axes_locatable(ax)
    ax_hist_x = divider.append_axes("top", 1.2, pad=0.1, sharex=ax)
    for tl in ax_hist_x.get_xticklabels():
        tl.set_visible(False)
    ax_hist_x.hist(frame[column_x], bins=50)

    ax_hist_y = divider.append_axes("right", 1.2, pad=0.1, sharey=ax)
    for tl in ax_hist_y.get_yticklabels():
        tl.set_visible(False)
    ax_hist_y.hist(frame[column_y], orientation='horizontal', bins=bins)

    if frame_one_track is not None:
        ax.scatter(frame_one_track[column_x], frame_one_track[column_y], label="two tracks", marker='.')
        ax_hist_x.hist(frame_one_track[column_x], bins=bins)
        ax_hist_y.hist(frame_one_track[column_y], orientation='horizontal', bins=bins)

    if commands is not None:
        commands(ax)

If I now plot some random data, everything looks fine and as intended.

df = pd.DataFrame(np.random.randn(1000, 3)*1000, columns=["a", "b", "c"])
cut = df["c"] < 20
frame1 = df[cut]
frame2 = df[~cut]

plt.figure(figsize=(6,6))
projection_plot(plt.subplot(), frame1, "a", "b", frame2, commands=lambda ax: (
    ax.legend(),
    ax.set_title("Random Values", y=1.4),
    ax.set_xlabel("column 0"),
    ax.set_ylabel("column 1")))

correct plot

If I now try to set the scales of either (or both) axis to log, something breaks and the plot becomes unreadable:

plt.figure(figsize=(6,6))
projection_plot(plt.subplot(), frame1, "a", "b", frame2, commands=lambda ax: (
    ax.legend(),
    ax.set_yscale('log'),
    ax.set_title("Random Values", y=1.4),
    ax.set_xlabel("column 0"),
    ax.set_ylabel("column 1")))

broken plot

In some of my data sets it seemed to work fine, while for others it breaks like with this random data. How can I fix this?

Also: Since I'm relatively new to Python, is this good coding style? Passing multi line lambdas for further configuration? I have the feeling that Ruby blocks ruined me…

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
Philipp Stephan
  • 372
  • 5
  • 17

1 Answers1

2

I do not exactly know the reason why this fails, I could imagine that the problem is related to the data ranging to below 0 for which a log scale is not defined.

In any case you would need to set the limits of the plot manually,

ax.set_yscale('log')
ax.set_ylim(1,None)

enter image description here

Possibly you want to use a symlog scale instead.

ax.set_yscale('symlog')

In this case no limit adjustment needs to be made.

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • My actual data set should not contain values less then or equal to 0, but apparently it did. Setting the limits manually works perfectly, thanks. Although I'm also puzzled, why the auto detection of limits would fail, if it doesn't on a regular scatter plot. – Philipp Stephan Mar 14 '18 at 03:50
  • 1
    I posted this as [issue on the GitHub tracker](https://github.com/matplotlib/matplotlib/issues/10782). – ImportanceOfBeingErnest Mar 14 '18 at 14:30