2

I got a dataframe Filled_series:

2015-05-03     0.00000
2015-05-09         NaN
2015-05-15     7.27943
2015-05-21         NaN
2015-05-27     7.26766
                ...   
2019-10-03    12.96608
2019-10-09     9.42112
2019-10-15     6.36359
2019-10-21    10.27396
2019-10-27     9.76496
Name: 367, Length: 274, dtype: float32

and I want to plot it so I do :

fig, ax = plt.subplots(figsize=(15,15))
x_ax = Filled_series.index
ax.plot(x_ax, Filled_series)

Here is what I got enter image description here

With ax.scatter it works fine but not with ax.plot

It doesn't start from the 2015-05-03, I don't know why. How can I do to get the whole time series ?

max
  • 3,915
  • 2
  • 9
  • 25
Equinox
  • 91
  • 1
  • 6
  • Have you tried changing NAN values to 0. Might be getting caught on this and skipping ahead for the line plot. The scatter plot probably just skips the point. Worth trying but may not be the solution. – Justin Oberle Dec 04 '20 at 16:56
  • 1
    Can you verify that the entirety of your date column/series index is actually datetime? Pandas shouldn't have any issues with the NaNs to my knowledge. – Patrick O'Connor Dec 04 '20 at 16:59
  • Does this answer your question? [matplotlib: drawing lines between points ignoring missing data](https://stackoverflow.com/questions/14399689/matplotlib-drawing-lines-between-points-ignoring-missing-data) or [Plot pandas dataframe containing NaNs](https://stackoverflow.com/q/13603181/8881141) – Mr. T Dec 05 '20 at 14:00

1 Answers1

1

The plotting works fine, but the plotting function can't plot a line between a point and a NaN. If you force matplotlib to plot markers (e.g. by line style -o), you can see that it plots the non-NaN-points but won't produce a line from them to a NaN. Lines are only drawn between existing points.

This is why you have this wide withe space at the beginning of your plot. The points are there (and therefore, the axis's limits respect them) but you can't see them because per default, the line-plot plots lines and not points (aka. markers). This is also the reason why the scatter-plot works. It draws points rather than lines.

Perhaps, it gets a bit clearer, when looking at an example:

import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np

data = [['2015-05-03',     0.00000],
['2015-05-09',     np.nan ],
['2015-05-15',     7.27943],
['2015-05-21',     np.nan ],
['2015-05-27',     7.26766],
['2015-06-03',     np.pi]]

df = pd.DataFrame(data,columns=['date_str','val'])
# convert to datetime
df['date'] = [datetime.datetime.strptime(s, '%Y-%m-%d') for s in df['date_str']]

df.plot(kind='line',x='date',y='val',rot=45,style='-o')

output fig

So now, if you want to ignore the NaN-entries, you can mask them:

mask = ~np.isnan(df['val'])
df[mask].plot(kind='line',x='date',y='val',rot=45)

ignore NaNs

BTW, the pandas.DataFrame.plot() method uses matplotlib.pyplot.plot() in the background. It is sometimes a bit more convenient because it adds labels and legends directly^^

max
  • 3,915
  • 2
  • 9
  • 25
  • Isn't the point of the question to get a solution, e.g., with [`fillna(method="ffill")`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.fillna.html)? – Mr. T Dec 05 '20 at 13:47
  • @Mr.T good point. I completely concentrated on explaining the behavior... I just added a solution based on masking the `NaN` entries – max Dec 05 '20 at 14:05