0

I have a dataframe like below:

import pandas as pd
import numpy as np
period0 = pd.date_range('1/1/2011', periods=50, freq='D')
period1 = pd.date_range('18/5/2012', periods=50, freq='D')
period2 = pd.date_range('7/11/2014', periods=50, freq='D')
df = pd.concat((pd.DataFrame(period0), pd.DataFrame(period1), pd.DataFrame(period2)), axis=0)

df['y'] = pd.DataFrame(np.random.rand(150,1))

These dates and periods are arbitrarily chosen to create some gaps and dates.

When I try to plot the dataframe, matplotlib automatically draws a line in between the date gaps:

plt.plot(df[0], df['y'])

Result: enter image description here

I also tried to dotplot. But it didn't prevent the plot from creating the line:

plt.plot(df[0], df['y'], ':')

Result: enter image description here

And I also found a relevant question. Unfortunately, it didn't solve my problem.

So, what should I do?

Muser
  • 593
  • 1
  • 9
  • 23

2 Answers2

1

You should define values you do not want to see as NaN:

https://matplotlib.org/examples/pylab_examples/nan_test.html

For example:

df.index = df[0].astype('datetime64')
#defining df[0] as index

idx = pd.date_range(start = '1/1/2011', end = max(period2), freq='D')
#creating new index

df = df.reindex(idx)
#reindexing df - it preserves values from 'y'

plt.plot(df.index, df['y'])
#creating plot
cors
  • 527
  • 4
  • 11
  • Actually I'm not able to write a function for it. It's a bit hard for me, since I have multiple dataframes like this. Could you please add some pseudo code? – Muser Jan 06 '19 at 21:31
  • The code works. Thanks for this valuable code. I really appreciate it. But when I try to run the code multiple times, I get this error: `ValueError: cannot reindex from a duplicate axis` – Muser Jan 07 '19 at 06:48
  • @ImportanceOfBeingErnest reindex(idx) adds new indexes and for rows with no data in df['y] procudes NaN – cors Jan 07 '19 at 08:35
  • My previous comment was meant to help improving the answer. – ImportanceOfBeingErnest Jan 07 '19 at 10:27
1

If you can't modify your existing index, you could try :

df.groupby(pd.Grouper(key=0, freq='1D'))['y'].last().plot()
geof2832
  • 136
  • 3