matplotlib: unreadable scatter plots and histograms on shared axis with logscale

Question

I'm plotting a couple of scatter plots with a lot of data points. At some point half the plot is just solid color and you cannot see the density very well. So I want to "project" the data onto the axis and display a histogram.

I wrote a little function that does that. To a plot on axis ax it plots the fields column_x vs column_y of the pandas DataFrame frame. If one_track_frame is given, it is also plotted on top of that. To add add a title and labels etc. a lambda can be passed with the ax object as parameter.

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import pandas as pd

def projection_plot(ax, frame, column_x, column_y, frame_one_track=None, commands=None, bins=100):
    ax.scatter(frame[column_x], frame[column_y], label="one track", marker='x')

    divider = make_axes_locatable(ax)
    ax_hist_x = divider.append_axes("top", 1.2, pad=0.1, sharex=ax)
    for tl in ax_hist_x.get_xticklabels():
        tl.set_visible(False)
    ax_hist_x.hist(frame[column_x], bins=50)

    ax_hist_y = divider.append_axes("right", 1.2, pad=0.1, sharey=ax)
    for tl in ax_hist_y.get_yticklabels():
        tl.set_visible(False)
    ax_hist_y.hist(frame[column_y], orientation='horizontal', bins=bins)

    if frame_one_track is not None:
        ax.scatter(frame_one_track[column_x], frame_one_track[column_y], label="two tracks", marker='.')
        ax_hist_x.hist(frame_one_track[column_x], bins=bins)
        ax_hist_y.hist(frame_one_track[column_y], orientation='horizontal', bins=bins)

    if commands is not None:
        commands(ax)

If I now plot some random data, everything looks fine and as intended.

df = pd.DataFrame(np.random.randn(1000, 3)*1000, columns=["a", "b", "c"])
cut = df["c"] < 20
frame1 = df[cut]
frame2 = df[~cut]

plt.figure(figsize=(6,6))
projection_plot(plt.subplot(), frame1, "a", "b", frame2, commands=lambda ax: (
    ax.legend(),
    ax.set_title("Random Values", y=1.4),
    ax.set_xlabel("column 0"),
    ax.set_ylabel("column 1")))

If I now try to set the scales of either (or both) axis to log, something breaks and the plot becomes unreadable:

plt.figure(figsize=(6,6))
projection_plot(plt.subplot(), frame1, "a", "b", frame2, commands=lambda ax: (
    ax.legend(),
    ax.set_yscale('log'),
    ax.set_title("Random Values", y=1.4),
    ax.set_xlabel("column 0"),
    ax.set_ylabel("column 1")))

In some of my data sets it seemed to work fine, while for others it breaks like with this random data. How can I fix this?

Also: Since I'm relatively new to Python, is this good coding style? Passing multi line lambdas for further configuration? I have the feeling that Ruby blocks ruined me…

score 2 · Accepted Answer · answered Mar 14 '18 at 02:00

2

I do not exactly know the reason why this fails, I could imagine that the problem is related to the data ranging to below 0 for which a log scale is not defined.

In any case you would need to set the limits of the plot manually,

ax.set_yscale('log')
ax.set_ylim(1,None)

Possibly you want to use a symlog scale instead.

ax.set_yscale('symlog')

In this case no limit adjustment needs to be made.

answered Mar 14 '18 at 02:00

ImportanceOfBeingErnest

321,279
53
665
712

My actual data set should not contain values less then or equal to 0, but apparently it did. Setting the limits manually works perfectly, thanks. Although I'm also puzzled, why the auto detection of limits would fail, if it doesn't on a regular scatter plot. – Philipp Stephan Mar 14 '18 at 03:50
1

I posted this as [issue on the GitHub tracker](https://github.com/matplotlib/matplotlib/issues/10782). – ImportanceOfBeingErnest Mar 14 '18 at 14:30

matplotlib: unreadable scatter plots and histograms on shared axis with logscale

1 Answers1

Linked