5

I have a pandas dataframe which contains some sar output that I would like to plot in matplotlib. Sample data is below.

>>> cpu_data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 70 entries, 0 to 207
Data columns (total 8 columns):
00:00:01    70 non-null datetime64[ns]
CPU         70 non-null object
%user       70 non-null float64
%nice       70 non-null float64
%system     70 non-null float64
%iowait     70 non-null float64
%steal      70 non-null float64
%idle       70 non-null float64
dtypes: float64(6), object(2)
memory usage: 4.4+ KB

>>> cpu_data
     00:00:01  CPU  %user  %nice  %system  %iowait  %steal  %idle
0    00:10:01  all   0.30   0.00     0.30     0.06     0.0  99.34
3    00:20:01  all   0.09   0.00     0.13     0.00     0.0  99.78
6    00:30:01  all   0.07   0.00     0.11     0.00     0.0  99.81
9    00:40:01  all   0.08   0.00     0.11     0.00     0.0  99.80
12   00:50:01  all   0.08   0.00     0.13     0.00     0.0  99.79
15   01:00:04  all   0.09   0.00     0.13     0.00     0.0  99.77
18   01:10:01  all   0.27   0.00     0.28     0.00     0.0  99.46
21   01:20:01  all   0.09   0.00     0.11     0.00     0.0  99.79
24   01:30:04  all   0.12   0.00     0.13     0.01     0.0  99.74
27   01:40:01  all   0.08   0.00     0.11     0.01     0.0  99.80
30   01:50:01  all   0.09   0.00     0.13     0.01     0.0  99.77

I want to plot using the timestamps as the x-axis. I have written the following code.

import pandas as pd
import os
import matplotlib.pyplot as plt
import matplotlib.dates as md
import dateutil
import matplotlib.dates as mdates    

cpu_data[cpu_data.columns[0]] = [dateutil.parser.parse(s) for s in cpu_data[cpu_data.columns[0]]]
plt.subplots_adjust(bottom=0.2)
plt.xticks( rotation=25 )
ax=plt.gca()
ax.xaxis_date()
xfmt = md.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
cpu_data.plot(ax=ax)
plt.show()

But I get the following error

ValueError: view limit minimum -5.1000000000000005 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units

This doesn't make any sense because I manually converted all of the time stamp strings to datetime objects

cpu_data[cpu_data.columns[0]] = [dateutil.parser.parse(s) for s in cpu_data[cpu_data.columns[0]]]

But they don't appear to be the correct data type

2018-09-30 00:10:01     <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2018-09-30 00:20:01     <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2018-09-30 00:30:01     <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2018-09-30 00:40:01     <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2018-09-30 00:50:01     <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2018-09-30 01:00:01     <class 'pandas._libs.tslibs.timestamps.Timestamp'>

I have no idea how to fix this. I have tried manually setting the x-axis to start on a datetime object value using plt.xlim(cpu_data[cpu_data.columns[0]].iloc[0]) but this produces the same error. I really am lost here. Any guidance would be appreciated. I can provide more information if it would help.

EDIT:

I think the dates are not the correct data type (as indicated by the error). It seems like pandas keeps converting the data in the time column (column 0) to on object of type pandas._libs.tslibs.timestamps.Timestamp. I think it should be a datetime object as matplotlib complains about.

Timothy Pulliam
  • 132
  • 1
  • 9
  • 25
  • What happens if you use `cpu_data.plot(ax=ax, x_compat=True)`? I can only repeat my previous comment though: The data shown here does not make it easy to help. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – ImportanceOfBeingErnest Sep 30 '18 at 14:52
  • Well, why is it that when I convert time stamp from a string '01:30:04' it does not become a `datetime` object, but rather a `pandas._libs.tslibs.timestamps.Timestamp` object? I am thinking this is what is causing it. – Timothy Pulliam Sep 30 '18 at 15:00
  • Pandas likes to store datetimes in its own format. There is nothing wrong with that *per se* and as you found out already (e.g.when using `ax.plot`), matplotlib is perfectly capable of converting this for its own purpose. Also pandas will itself handle those perfectly fine via `df.plot()`. The problem you run into is that you want to use a matplotlib date formatter for a plot created with pandas locators. If the above mentionned solution (`x_compat`) does not help, I (or others as well I suppose) would need a [mcve] of the issue. – ImportanceOfBeingErnest Sep 30 '18 at 15:39
  • Nevermind, I decided to just do the plots using `matplotlib.pyplot` the old fashioned way. I'm not sure why it didn't work using pandas but it works now. I will post the code in a bit once I am done tinkering. – Timothy Pulliam Sep 30 '18 at 17:12

1 Answers1

2

For those interested, this is how I ended up plotting the data using matplotlib

# Plot cpu
plt.figure(1)
plt.subplots_adjust(bottom=0.2)
plt.xticks(rotation=25)
ax=plt.gca()
ax.xaxis_date()
xfmt = md.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
plt.title(f'CPU usage on {remote_host}')
lines = plt.plot(dates, cpu_data[cpu_data.columns[2:]])
ax.legend(lines, [str(col) for col in list(cpu_data.columns[2:])])
plot.show()
Timothy Pulliam
  • 132
  • 1
  • 9
  • 25