0

I am analysing a CSV file with Matplotlib/Python.

This is the CSV file. https://github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/master/challenge2/data/total_watt.csv

Importing a CSV file, I successfully plotted a graph and visualised energy consumption per 30 minutes with the following code.(Thank you guys!! Using Matplotlib, visualize CSV data)

from matplotlib import style
from matplotlib import pylab as plt
import numpy as np

style.use('ggplot')

filename='total_watt.csv'
date=[]
number=[]

import csv
with open(filename, 'rb') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
    for row in csvreader:
        if len(row) ==2 :
            date.append(row[0])
            number.append(row[1])

number=np.array(number)

import datetime
for ii in range(len(date)):
    date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')

plt.plot(date,number)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()

But the thing is, I cannot visualize the energy consumption per days...

------------Edited (Thank you Florian!!)------------

I installed pandas and added a code for pandas to my code.

Now, my code is look like as following;

from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
import pandas as pd

style.use('ggplot')

filename='total_watt.csv'
date=[]
number=[]

import csv
with open(filename, 'rb') as csvfile:

    df = pd.read_csv('total_watt.csv', parse_dates=[0], index_col=[0])
    df.resample('1D', how='sum')



for row in df:
        if len(row) == 2 :
            date.append(row[0])
            number.append(row[1])

number=np.array(number)

import datetime
for ii in range(len(date)):
    date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')

plt.plot(date,number)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()

and when I implemented this code. I got no error. But in my graph, nothing is drawn..How can I solve it..?

Community
  • 1
  • 1
Suzuki Soma
  • 519
  • 1
  • 8
  • 16
  • You don't need to rewrite the file. You should, as always, read it all at once, sort the values (if not already sorted), put the values into _bins_ where they belong (where, in this case, every bin would be a time interval), perform some calculation inside every bin (average or peak, typically), and plot these results. – heltonbiker Jul 06 '15 at 12:05
  • Thank you very much! But how can I use bin!?!? – Suzuki Soma Jul 06 '15 at 12:21
  • Bin is a concept, is something like a "bucket", a container. In the case of numbers, each bin is an array of values. Usually you divide some interval in "bins" or "buckets" of equal size, so that each bin will have min and max values, and every value in your array will fit into its relative bins. Histograms work that way. So, you could have a "bin" for a given day, and every measurement happening between the start and the end of that day should be stored into that bin (that is, the array representing that bin). Take a look at this example: http://stackoverflow.com/a/19944158/401828 – heltonbiker Jul 06 '15 at 13:33

1 Answers1

3

Using pandas and the resample function could make your life easier.

Data

import io
import pandas as pd
content = '''timestamp  value
2011-04-18 16:52:00     152.684299188514
2011-04-18 17:22:00     327.579073188405
2011-04-18 17:52:00     156.826945856169
2011-04-18 18:22:00     330.202764488018
2011-04-18 18:52:00     1118.60404324133
2011-04-18 19:22:00     243.972250782998
2011-04-18 19:52:00     852.88815851216
2011-04-18 20:22:00     491.859992982456
2011-04-18 20:52:00     466.738983617709
2011-04-18 21:22:00     659.670303375527
2011-04-18 21:52:00     576.304871428571
2011-04-18 22:22:00     2497.20620579196
2011-04-18 22:52:00     2790.20392088608
2011-04-18 23:22:00     1092.20906629318
2011-04-18 23:52:00     825.994417375886
2011-04-19 00:22:00     2397.16672089666
2011-04-19 00:52:00     1411.66659265233
2011-04-19 01:22:00     2379.18391111111
2011-04-19 01:52:00     841.224212511672
2011-04-19 02:22:00     471.520308479532
2011-04-19 02:52:00     1189.78122544232
2011-04-19 03:22:00     343.7574197609
2011-04-19 03:52:00     336.486834795322
2011-04-19 04:22:00     541.401434220355
2011-04-19 04:52:00     316.106452883263
2011-04-19 05:22:00     502.502274561404
2011-04-19 05:52:00     314.832323976608
'''

df = pd.read_table(io.BytesIO(content.encode('UTF-8')), sep='\s{2,}', parse_dates=[0], index_col=[0], engine='python')

Using resample function

See documentation here : http://pandas-docs.github.io/pandas-docs-travis/

per 30 min

df = df.resample('30min', how='sum')
Out[496]: 
                           value
timestamp                       
2011-04-18 16:30:00   152.684299
2011-04-18 17:00:00   327.579073
2011-04-18 17:30:00   156.826946
2011-04-18 18:00:00   330.202764
2011-04-18 18:30:00  1118.604043
2011-04-18 19:00:00   243.972251
2011-04-18 19:30:00   852.888159
2011-04-18 20:00:00   491.859993
2011-04-18 20:30:00   466.738984
2011-04-18 21:00:00   659.670303
2011-04-18 21:30:00   576.304871
2011-04-18 22:00:00  2497.206206
2011-04-18 22:30:00  2790.203921
2011-04-18 23:00:00  1092.209066
2011-04-18 23:30:00   825.994417
2011-04-19 00:00:00  2397.166721
2011-04-19 00:30:00  1411.666593
2011-04-19 01:00:00  2379.183911
2011-04-19 01:30:00   841.224213
2011-04-19 02:00:00   471.520308
2011-04-19 02:30:00  1189.781225
2011-04-19 03:00:00   343.757420
2011-04-19 03:30:00   336.486835
2011-04-19 04:00:00   541.401434
2011-04-19 04:30:00   316.106453
2011-04-19 05:00:00   502.502275
2011-04-19 05:30:00   314.832324

Per day

df = df.resample('1D', how='sum')
Out[497]: 
                   value
timestamp               
2011-04-18  12582.945297
2011-04-19  11045.629711

Plot

Per 30 minutes

Hope it helps!

gowithefloww
  • 2,211
  • 2
  • 20
  • 31
  • Thank you Florian! It helped me a lot. But I cant get the result.. It would be great if you could check my question(I edited). – Suzuki Soma Jul 06 '15 at 13:52
  • Might be because you did not allocate the resampled dataframe to df : ``df = df.resample('1D', how='sum')``. Then you just need to run ``df.plot()`` – gowithefloww Jul 06 '15 at 14:30