0

I have a csv file of power levels at several stations (4 in this case, though "HUT4" is not in this short excerpt):

2014-06-21T20:03:21,HUT3,74
2014-06-21T21:03:16,HUT1,70
2014-06-21T21:04:31,HUT3,73
2014-06-21T21:04:33,HUT2,30
2014-06-21T22:03:50,HUT3,64
2014-06-21T23:03:29,HUT1,60
(etc . .)

The times are not synchronised across stations. The power level is (in this case) integer percent. Some machines report in volts (~13.0), which would be an additional issue when plotting.

The data is easy to read into a dataframe, to index the dataframe, to put into a dictionary. But I can't get the right syntax to make a meaningful plot. Either all stations on a single plot sharing a timeline that's big enough for all stations, or as separate plots, maybe a subplot for each station. If I do:

import pandas as pd
df = pd.read_csv('Power_Log.csv',names=['DT','Station','Power'])
df2=df.groupby(['Station']) # set 'Station' as the data index
d = dict(iter(df2)) # make a dictionary including each station's data
for stn in d.keys():
    d[stn].plot(x='DT',y='Power')
plt.legend(loc='lower right')
plt.savefig('Station_Power.png')

I do get a plot but the X axis is not right for each station.

I have not figured out yet how to do four independent subplots, which would free me from making a wide-enough timescale.

I would greatly appreciate comments on getting a single plot right and/or getting good looking subplots. The subplots do not need to have synchronised X axes.

2 Answers2

1

I'd rather plot the typical way, smth like:

import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
plt.savefig()

( http://matplotlib.org/users/pyplot_tutorial.html )

Re more subplots: simply call plt.plot() multiple times, once for each data series.

P.S. you can set xticks this way: Changing the "tick frequency" on x or y axis in matplotlib?

Community
  • 1
  • 1
LetMeSOThat4U
  • 6,470
  • 10
  • 53
  • 93
  • The problem is that each datetime matches only one station data item. The nulls in the dataset cause the plot to be broken. I have progressed toward a solution, from the 5th code line above . . – user3808985 Jul 29 '14 at 08:19
  • Then you have a different problem than specified in the post... See here for handling NaNs (not a number - is that what you mean by "null"?): http://stackoverflow.com/questions/10939391/matplotlib-issues-when-nan-first-in-list – LetMeSOThat4U Jul 29 '14 at 08:39
0

Sorry for the comment above where I needed to add code. Still learning . .

From the 5th code line:

import matplotlib.dates as mdates
for stn in d.keys():
     plt.figure()
     d[stn].interpolate().plot(x='DT',y='Power',title=stn,rot=45)
     plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%D/%M/%Y'))
     plt.savefig('Station_Power_'+stn+'.png')

Does more or less what I want to do except the DateFormatter line does not work. I would like to shorten my datetime data to show just date. If it places ticks at midnight that would be brilliant but not strictly necessary.

The key to getting a continuous plot is to use the interpolate() method in the plot.

With this data having different x scales from station to station a plot of all stations on the same graph does not work. HUT4 in my data has far fewer records and only plots to about 25% of the scale even though the datetime values cover more or less the same range as the other HUTs.