1

For some background info, I would like to create a scatter plot of different dataframes (each dataframe as been read from a csv) where the x value is the date and the y value is the water 'level'.

I've been trying to work out how I can make a scatter plot were the x value is the date or the index. After trying a number of options, I feel as though this is the 'best' error I have got so far:

    KeyError: "None of [DatetimeIndex(['2017-11-04 00:00:00',    
    '2017-11-04 01:00:00',\n ... '2018-02-26 11:00:00', '2018-02-26 
    12:00:00'],\n dtype='datetime64[ns]', name='date', length=2749, 
    freq=None)] are in the [columns]" .   

I'm importing in my data from a csv file that looks something like this:

    date,               level
    2017-10-26 14:00:00, 700.1
    2017-10-26 15:00:00, 500.5
    2017-10-26 16:00:00, NaN
               ...

And I'm reading in the file like so:

df = pd.read_csv("data.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-11-04 00:00:00':]

Then this is my attempt at trying to plot the scatter plot:

ax = df.plot()
ax1 = df.plot(kind='scatter', x=df.index, y='level', color='r')

# ... my other dataframes I'd like to plot on the same graph...

I've only started using pandas so apologies for my lack of understanding. I've been fiddling about with what different ways of importing the csv ( the sep='\s*,\s*' was one attempt) but to no avail. I'd greatly appreciate any advice, thank you.

Edit: More thorough code

data1.csv:

date,level
2017-10-26 14:00:00,500.1
2017-10-26 15:00:00,600.5
2017-10-26 16:00:00,NaN
2017-10-26 17:00:00,NaN
2017-10-26 18:00:00,NaN
2017-10-26 19:00:00,600.5
2017-10-26 20:00:00,600.5
2017-10-26 21:00:00,700.0
2017-10-26 22:00:00,700.0

data2.csv:

date,level
2017-10-26 15:00:00,600.5
2017-10-26 16:00:00,NaN
2017-10-26 17:00:00,NaN
2017-10-26 18:00:00,NaN
2017-10-26 19:00:00,600.5
2017-10-26 20:00:00,600.5
2017-10-26 21:00:00,900.0
2017-10-26 22:00:00,900.0
2017-10-26 23:00:00,NaN

code:

import pandas as pd
import warnings
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')

df = pd.read_csv("data1.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-10-26 15:00:00':]

df2 = pd.read_csv("data2.csv", parse_dates=['date'],sep='\s*,\s*')
df2.set_index('date', inplace=True)
df2 = df2.loc[:'2017-10-26 22:00:00']

ax1 = df.plot(kind='scatter', x='date', y='level', color='r')
ax2 = df2.plot(kind='scatter', x='date', y='level', color='g',      ax=ax1)

plt.show()
  • `x` should be a column name of your dataframe. Does `x="date"` not work? Or removing that argument completely? – ImportanceOfBeingErnest Feb 18 '19 at 22:10
  • That was originally what I was trying to do but unfortunately doesn't work , so where ```x='date'``` I get ```KeyError: 'date'```. And when I remove the 'x' argument I get a complaint saying that it has to be there. Also when I enter ```df.columns``` I only get 'level' back, which could be due to the fact I made the date my index, maybe? – chromestone Feb 18 '19 at 22:23
  • Oh yes. Keep "date" as column to be able to use it inside the `x` argument. – ImportanceOfBeingErnest Feb 18 '19 at 22:29
  • So this may turn into a different question, but if I remove ```df.set_index('date', inplace=True)``` I get this error: ```ValueError: view limit minimum -36837.575000000004 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units ``` Do you think you'd be able to help me by shedding some light on this? As in I'm not sure how I can do both at the same time. This error shows up at the ```ax = df1.plot()``` line – chromestone Feb 18 '19 at 22:36
  • Note that it's really cumbersome without [mcve]. So I can only comment on individual steps instead of just providing a working answer. It could well be that pandas is not able to plot scatter plots with dates. What you can always do is `plt.scatter(df["date"].values, df['level'].values)` instead. – ImportanceOfBeingErnest Feb 18 '19 at 22:43
  • One problem is also that you first `df.plot()` something. So that would probably be in different units, so if using matplotlib make sure to plot this also via `plt.plot` instead. – ImportanceOfBeingErnest Feb 18 '19 at 22:52
  • I improved my question to reflect the Minimal, Complete, and Verifiable policy, including some sample csv files! And will remove the df.plot() now. The ```plt.scatter(df["date"].values, df['level'].values)``` did work for me! I will try and continue with that if the updated question is no use. Thank you so much. – chromestone Feb 18 '19 at 23:11

1 Answers1

1

In case anyone runs into the same problem, I found a work around as described here: pandas scatter plotting datetime

I just added style='o' as seen below:

df = pd.read_csv("data1.csv", parse_dates=['date'],sep='\s*,\s*')
df.set_index('date', inplace=True)
df = df.loc['2017-10-26 15:00:00':]
ax = df.plot(style='o')

df2 = pd.read_csv("data2.csv", parse_dates=['date'],sep='\s*,\s*')
df2.set_index('date', inplace=True)
df2 = df2.loc[:'2017-10-26 22:00:00']
df2.plot(ax=ax,style='o')

plt.show()