40

I've got pandas DataFrame, df, with index named date and the columns columnA, columnB and columnC

I am trying to scatter plot index on a x-axis and columnA on a y-axis using the DataFrame syntax.

When I try:

df.plot(kind='scatter', x='date', y='columnA')

I ma getting an error KeyError: 'date' probably because the date is not column

df.plot(kind='scatter', y='columnA')

I am getting an error:

ValueError: scatter requires and x and y column

so no default index on x-axis.

df.plot(kind='scatter', x=df.index, y='columnA')

I am getting error

KeyError: "DatetimeIndex(['1818-01-01', '1818-01-02', '1818-01-03', '1818-01-04',\n
                          '1818-01-05', '1818-01-06', '1818-01-07', '1818-01-08',\n
                          '1818-01-09', '1818-01-10',\n               ...\n  
                          '2018-03-22', '2018-03-23', '2018-03-24', '2018-03-25',\n
                          '2018-03-26', '2018-03-27', '2018-03-28', '2018-03-29',\n 
                          '2018-03-30', '2018-03-31'],\n  
dtype='datetime64[ns]', name='date', length=73139, freq=None) not in index"

I can plot it if I use matplotlib.pyplot directly

plt.scatter(df.index, df['columnA'])

Is there a way to plot index as x-axis using the DataFrame kind syntax?

William Miller
  • 9,839
  • 3
  • 25
  • 46
Kocur4d
  • 6,701
  • 8
  • 35
  • 53

3 Answers3

30

This is kind of ugly (I think the matplotlib solution you used in your question is better, FWIW), but you can always create a temporary DataFrame with the index as a column usinng

df.reset_index()

If the index was nameless, the default name will be 'index'. Assuming this is the case, you could use

df.reset_index().plot(kind='scatter', x='index', y='columnA')
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Good to know, thank you. I think I know why it is not possible out of the box. I doesn't make much sense to do the scatter plot with `index` as the index should be unique. In my case it would be better to use bins. – Kocur4d Apr 14 '18 at 19:22
  • 2
    @Kocur4d Thanks. Just an FYI: index does *not* have to be unique: `pd.DataFrame({'a': [1, 2]}, index=[1, 1])` is fine. I suspect the lack of support for scatters on index in Pandas is more of an oversight. – Ami Tavory Apr 14 '18 at 19:25
9

A more simple solution would be:

df['x1'] = df.index
df.plot(kind='scatter', x='x1', y='columnA')

Just create the index variable outside of the plot statement.

Clovis
  • 183
  • 1
  • 8
  • 1
    I sat `x=df.index` directly, so I could get rid of an error this solution provides. But for the most part, I found this answer to be correct. – carloswm85 Jul 28 '20 at 20:26
  • Yeah, as I come back and look at this. I agree `df.plot(kind='scatter', x=df.index, y='columnA')` is better. – Clovis Feb 12 '21 at 07:16
  • Hmm, `df.plot(kind='scatter', x=df.index, y='columnA')` gives me an error: `raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: 'None of [RangeIndex(start=0, stop=4, step=1)] are in the [columns]'` – Martin R Sep 08 '22 at 21:26
9

At least in pandas>1.4 whats easiest is this:

df['columnA'].plot(style=".")

This lets you mix scatter and line plots, as well as use the standard pandas plot interface

Amal Duriseti
  • 91
  • 1
  • 1
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – user11717481 Apr 22 '22 at 22:03
  • This should be the correct as it doesn't require any augmentation of the original data frame. – Taylor Apr 27 '22 at 23:25
  • A line plot does not have some of the scatter plot options like coloring individual points. – Martin R Sep 08 '22 at 21:31