7

I have created a basic scatter plot to compare two variables using altair. I expect the variables to be strongly correlated and the points should end up on or close to the line of identity.

How can I add the line of identity to the plot?

I would like it to be a line similar to those created by mark_rule, but extending diagonally instead of vertically or horizontally.

Here is as far as I have gotten:

import altair as alt
import numpy as np
import pandas as pd

norm = np.random.multivariate_normal([0, 0], [[2, 1.8],[1.8, 2]], 100)

df = pd.DataFrame(norm, columns=['var1', 'var2'])

chart = alt.Chart(df, width=500, height=500).mark_circle(size=100).encode(
    alt.X('var1'),
    alt.Y('var2'),
).interactive()

line = alt.Chart(
    pd.DataFrame({'var1': [-4, 4], 'var2': [-4, 4]})).mark_line().encode(
            alt.X('var1'),
            alt.Y('var2'),
).interactive()

chart + line

The problems with this example is that the line doesn't extend forever when zooming (like a rule mark) and that the plot gets automatically scaled to the line endings instead of only the points.

Rikard N
  • 427
  • 4
  • 16

1 Answers1

1

It's not perfect but you could make the line longer and set the scale domain.

import altair as alt
import numpy as np
import pandas as pd

norm = np.random.multivariate_normal([0, 0], [[2, 1.8],[1.8, 2]], 100)

df = pd.DataFrame(norm, columns=['var1', 'var2'])

chart = alt.Chart(df, width=500, height=500).mark_circle(size=100).encode(
    alt.X('var1', scale=alt.Scale(domain=[-4,4])),
    alt.Y('var2', scale=alt.Scale(domain=[-4,4])),
).interactive()

line = alt.Chart(
    pd.DataFrame({'var1': [-100, 100], 'var2': [-100, 100]})).mark_line().encode(
            alt.X('var1'),
            alt.Y('var2'),
).interactive()

chart + line
dominik
  • 5,745
  • 6
  • 34
  • 45