0

I have a dataframe in pandas:

date_hour   score
2019041822  -5
2019041823  0
2019041900  6
2019041901  -5

where date_hour is in YYYYMMDDHH format, and score is an int.

when I plot, there is a long line connecting 2019041823 to 2019041900, treating all the values in between as absent (ie. there is no score relating to 2019041824-2019041899, because there is no time relating to that).

Is there a way for these gaps/absetvalues to be ignored, so that it is continuous (Some of my data misses 2 days, so I have a long line which is misleading)

The red circles show the gap between nights (ie. between Apr 18 2300 and Apr 19 0000).

I used:

fig, ax = plt.subplots()
x=gpb['date_hour']
y=gpb['score']
ax.plot(x,y, '.-')
display(fig)

enter image description here

I believe it is because the date_hours is an int, and tried to convert to str, but was met with errors: ValueError: x and y must have same first dimension

Is there a way to plot so there are no gaps?

frank
  • 3,036
  • 7
  • 33
  • 65
  • Try `df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H')` before plot. – Quang Hoang May 02 '19 at 14:09
  • I would seriously consider loading `matplotlib` on it's own for plotting. Also please search stackoverflow and google for the countless threads about plotting datetime axes. Take [this](https://stackoverflow.com/questions/55922899/seaborn-plot-misplotting-x-axis-dates-from-pandas/55923577#55923577) as an example. – flurble May 02 '19 at 14:10

1 Answers1

2

Try to convert date_hour to timestamp: df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H') before plot.

df = pd.DataFrame({'date_hour':[2019041822, 2019041823, 2019041900, 2019041901],
                   'score':[-5,0,6,-5]})
df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H')

df.plot(x='date_hour', y='score')
plt.show()

Output:

enter image description here

If you don't want to change your data, you can do

df = pd.DataFrame({'date_hour':[2019041822, 2019041823, 2019041900, 2019041901],
                   'score':[-5,0,6,-5]})

plt.plot(pd.to_datetime(df.date_hour, format='%Y%m%d%H'), df.score)

which gives:

enter image description here

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Love it, just not sure how to account for empty data in my set (no data on Apr 21) - second red circle on my example – frank May 02 '19 at 14:34