0

Using matplotlib/pandas/python, I cannot visualize data as values per 30mins and per days is a new question, which is strongly related to this question.

I want to visualize CSV data with Matplotlib.

Following is my code named 1.30mins.py

import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np

style.use('ggplot')

x,y =np.loadtxt('total_watt.csv',
                unpack = True,
                delimiter = ',')

plt.plot(x,y)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()

When I implemtented 1.30mins.py, I got a following error message.

(DataVizProj)Soma-Suzuki:Soma Suzuki$ python 1.30mins.py
Traceback (most recent call last):
  File "1.30mins.py", line 10, in <module>
    delimiter = ',')
  File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/numpy/lib/npyio.py", line 856, in loadtxt
    items = [conv(val) for (conv, val) in zip(converters, vals)]
ValueError: invalid literal for float(): 2011-04-18 13:22:00

This is my total_watt.csv

2011-04-18 21:22:00 659.670303375527
2011-04-18 21:52:00 576.304871428571
2011-04-18 22:22:00 2,497.20620579196
2011-04-18 22:52:00 2,790.20392088608
2011-04-18 23:22:00 1,092.20906629318
2011-04-18 23:52:00 825.994417375886
2011-04-19 00:22:00 2,397.16672089666
2011-04-19 00:52:00 1,411.66659265233

As far as I searched by myself, I need to add converters or, %y-%m-%t to my program.

My python version is 2.76 My matpltlib version is 1.42

Community
  • 1
  • 1
Suzuki Soma
  • 519
  • 1
  • 8
  • 16
  • Your error is not related to the file you attempt to read, but to your matplotlib. Which version of python and matplotlib are you using? Secondly, i'd recommend to try the [datetime dtype](http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html) for your data and to remove the `,` from your last column. – Daniel Lenz Jul 06 '15 at 09:13
  • My matplotlib version is; >>> import matplotlib as mpl >>> print mpl.__version__ 1.4.2 and my python version is; (DataVizProj)Soma-Suzuki:~ Suzuki$ python -V Python 2.7.6 – Suzuki Soma Jul 06 '15 at 09:22
  • I get a different error, namly "ValueError: invalid literal for float(): 2011-04-18 21:22:00 659.670303375527". I use Python 2.7.6 and Matplotlib 1.4.3 and I have no problems importing style. Also, note that plt is not defined in your code. – Edgar H Jul 06 '15 at 09:22
  • Thanks I defined "plt" and added to my question as well. – Suzuki Soma Jul 06 '15 at 09:32
  • and I got another error. I edited my question – Suzuki Soma Jul 06 '15 at 09:33

2 Answers2

3

Your data

2011-04-18 21:22:00 659.670303375527
2011-04-18 21:52:00 576.304871428571
...

is not delimited by spaces or commas. It could be regarded as having fixed-width columns however. np.genfromtxt can read fixed-width data. Instead of passing a string to delimiter, pass a sequence of ints representing the width of each field.


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib import style
style.use('ggplot')

x, y = np.genfromtxt('total_watt.csv',
                     unpack=True,
                     delimiter=[19, 10**6], dtype=None)

x = mdates.datestr2num(x)
y = np.array(np.char.replace(y, ',', ''), dtype=float)

fig, ax = plt.subplots()
ax.plot(x, y)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')
xfmt = mdates.DateFormatter('%Y-%m-%d %H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)

fig.autofmt_xdate()
plt.show()

yields enter image description here

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thank you very much. But, I dont know why, I could implement both your code and Negative Probability's code. And the result is pretty much same. – Suzuki Soma Jul 06 '15 at 11:21
  • in csv file "https://github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/master/challenge2/data/total_watt.csv" there is no ","... but when I open this csv file with Numbers, "," is added.. – Suzuki Soma Jul 06 '15 at 11:23
  • but when I opend this file with xcode, the csv file is like 2011-04-18 13:22:00,925.840613752523 2011-04-18 13:52:00,483.295891812865 2011-04-18 14:22:00,915.761633660131 there is a comma between time andvalue. I think that is why both of your code was successfully implemented!! – Suzuki Soma Jul 06 '15 at 11:25
  • It looks like the "raw" [actual data](https://raw.githubusercontent.com/camenergydatalab/EnergyDataSimulationChallenge/master/challenge2/data/total_watt.csv) is comma-delimited. So you could use `delimiter=','` instead of `delimiter=[19, ...]`. – unutbu Jul 06 '15 at 11:47
  • Thank you very much!! I'll do it:) – Suzuki Soma Jul 06 '15 at 11:52
  • oops. when I changed "delimiter=[19, ...]." to "delimiter=','" I got a following error "ValueError: too many values to unpack" – Suzuki Soma Jul 06 '15 at 11:56
  • What does "delimiter=[19, 10**6], dtype=None" mean!?!??! – Suzuki Soma Jul 06 '15 at 11:59
0

I don't know whether numpy has the functionality to read datetime objects directly. However, if you are NOT looking for an elegant solution, here is some quick and dirty code to do what you want using two other modules csv and datetime.

I use the file 'sample.csv' (note where I have placed commas):

     2011-04-18 21:22:00, 659.670303375527
     2011-04-18 21:52:00, 576.304871428571

And the code is

     from matplotlib import style
     from matplotlib import pylab as plt
     import numpy as np

     style.use('ggplot')

     filename='sample.csv'
     date=[]
     number=[]

     import csv
     with open(filename, 'rb') as csvfile:
         csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
         for row in csvreader:
             if len(row) ==2 :
                 date.append(row[0])
                 number.append(row[1])

     number=np.array(number)

     import datetime
     for ii in range(len(date)):
         date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')

     plt.plot(date,number)

     plt.title('Example')
     plt.ylabel('Y axis')
     plt.xlabel('X axis')

     plt.show()

Giving me the following graph. Graph of Result

If you are looking for a more elegant solution using numpy, I'm sure someone will know a better way.

Edgar H
  • 1,376
  • 2
  • 17
  • 31
  • Thank you very much!! It successfully worked!! It would be great if you could tell me what is "delimiter=',', quotechar='|'". I tried to search, but I could not understand it.. – Suzuki Soma Jul 06 '15 at 10:27
  • delimiter determines what seperates cells, here it is a comma. However, if you set it to be delimiated by spaces -> delimiter=' ' then the file would have three rows, as there are three spaces per line. Both "delimiter" and "quotechar" instructs writer objects to only quote those fields which contain these special characters. If the answer is what you are looking for, accept it so other people don't try to answer anymore. – Edgar H Jul 06 '15 at 10:32
  • And actually, my csv file is quite big, and it has so many row. I dont think I can put"," to every row... – Suzuki Soma Jul 06 '15 at 10:47
  • In the example you posted, there sometimes is a comma, sometimes there isn't. Maybe use a space as delimiator? Then , taking your first row, row[0] = "2011-04-18", row[1] = "21:22:00", row[2]="659.670303375527". You just need to search and delete any commas. The third row will look like this: row[0] = "2011-04-18", row[1] = " 22:22:00", row[2]=" 2,497.20620579196". So maybe you need to delete the number in front of the comma as well. I'm not sure what it means. – Edgar H Jul 06 '15 at 11:00
  • in csv file "github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/…; there is no ","... but when I open this csv file with Numbers, "," is added.. but when I opend this file with xcode, the csv file is like 2011-04-18 13:22:00,925.840613752523 2011-04-18 13:52:00,483.295891812865 2011-04-18 14:22:00,915.761633660131 there is a comma between time andvalue. I think that is why both of your code was successfully implemented!! – Suzuki Soma Jul 06 '15 at 11:29