2

I have a bit of Python code written by a previous engineer and I am trying to add to it. The script records the value of a serial string input, and logs the value in the string in a CSV file against time. The CSV file appears like this

12h35m15s,0.01t
12h35m16s,0.02t
12h35m17s,0.05t
12h35m18s,0.15t
12h35m19s,0.21t
12h35m20s,0.23t
12h35m21s,0.20t
12h35m22s,0.21t
12h35m23s,0.22t
12h35m24s,0.26t

and so on...

What I have done is added a section in the code so that when you press a button it uses matplotlib to generate a graph of the data in the CSV file.

The problem I have is that matplotlib cannot plot the time in the format 12h25m15s as it isn't a float. I have changed the code to remove the h, m and s from the CSV file, but the problem is the seconds value. The seconds at 1 second is 1s, not 01s, therefore my graphed values would be for example:

12358  (at 12h25m8s)
12359  (at 12h25m9s)
123510 (at 12h25m10s)
123511 (at 12h25m11s)
123512 (at 12h25m12s)
123513 (at 12h25m13s)

then when it goes to 12h36m

12361  (at 12h36m1s)
12362  (at 12h36m2s)

and so on...

this means that when my graph is plotted it sees 12h36m1s as a lower value than 12h35m10s and mixes up where the data should be in the graph.

I need to find one of two solutions:

1) Correct the time format so that matplotlib correctly plots the time 2) Set my x axis value to be the number of data records being plotted. I.e. if I have 50 data records for y, my x axis is simply 1-50 and not time.

Can anyone help on this?

My graph code is:

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
import numpy as np

s = open('log0.csv','r').read()

CSVunits = ('kN', 'T', 'lb', 'kg', 't') 
for c in CSVunits:
    s = ''.join( s.split(c) )

out_file = open('graph.csv','w')
out_file.write(s)
out_file.close()

data = [['13h10m5s'],['13h20m5s'],['13h30m5s'],['13h40m5s'],['13h50m5s'],['14h0m5s']]
#data = np.loadtxt('graph.csv', delimiter=',', skiprows=4,usecols=[0])
x = [mdates.date2num(dt.datetime.strptime(x[0], '%Hh%Mm%Ss')) for x in data]
y = np.loadtxt('graph.csv', delimiter=',', skiprows=4,usecols=[1])

fig,ax = plt.subplots(1)

ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
ax.plot_date(x, y, '-') # '-' adds the line on the graph joining points

plt.xlabel('Time')
plt.ylabel('Load Value')
plt.title('Graph Showing Load Against Time\n')

plt.show()

when I run this I get the error:

ValueError: invalid literal for float(): 12h35m15s

Just for clarity, in my data example the 1, 2 and 3 were just to indicate the rows in a CSV file, there is no column or data containing 1, 2 and 3. It is just the time and the float value I have in my CSV.

Thanks

CPS13
  • 27
  • 3
  • Is there meant to be a "n" in one of the csv file examples? Specifically this one - `3 12h35n17s 0.05` – DavidG Nov 02 '17 at 13:40
  • Take a look at the matplotlib plot_date function - https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.plot_date.html?highlight=matplotlib%20python%20plot_date#matplotlib.axes.Axes.plot_date – DaveL17 Nov 02 '17 at 13:47
  • Possible duplicate of [Plotting time in Python with Matplotlib](https://stackoverflow.com/questions/1574088/plotting-time-in-python-with-matplotlib) – wwii Nov 02 '17 at 14:59
  • I've just updated my answer to include your second question `Set my x axis value to be the number of data records being plotted.` – DaveL17 Nov 02 '17 at 15:14
  • I have updated my answer below to reflect the format of the new example data. – DaveL17 Nov 03 '17 at 13:58

2 Answers2

1

Imagine your data to look like

1   12h35m8s    0.02
2   12h35m9s    0.04
3   12h35m10s   0.06
4   12h35m11s   0.07
5   12h35m12s   0.08
6   12h35m13s   0.06
7   12h35m15s   0.05
8   12h35m16s   0.02
9   12h35m17s   0.03
10  12h36m1s    0.04
11  12h36m2s    0.03

You can read that in via pandas and directly plot it,

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("data/timetable2.txt", delim_whitespace=True, header=None, 
                 names=["time","quantitiy"], index_col=0)
df["time"] = pd.to_datetime(df["time"], format="%Hh%Mm%Ss")

df.set_index("time").plot()

plt.show()

enter image description here

You may also use matplotlib with the advantage to have more control over how the times are represented (e.g. you might use the original format "%Hh%Mm%Ss" on the axes).

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates

df = pd.read_csv("data/timetable2.txt", delim_whitespace=True, header=None, 
                 names=["time","quantity"], index_col=0)
df["time"] = pd.to_datetime(df["time"], format="%Hh%Mm%Ss")

plt.plot(df["time"],df["quantity"])
plt.gca().xaxis.set_major_formatter(matplotlib.dates.DateFormatter("%Hh%Mm%Ss"))

plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • thanks for the reply but I do not want to use Pandas. I tried using it but it comes up with loads of errors and stops my script from running. I spent hours yesterday trying to get it to work so would prefer not to use it. – CPS13 Nov 03 '17 at 09:46
1

I would still recommend @ImportanceOfBeingErnest answer (pandas is a great tool). But since I'd already started it, I'll throw up another possible approach. This is a simplified example (so that the important bits are obvious). Moreover, I think you are making it harder on yourself than you need to.

UPDATE 2: Based on OP's revisions to the question, here is a working example using the new source data. I've broken the data formatting into multiple lines to make it easier to see what's going on. I changed the x_labels line to reflect that the observation number is not in the source data as originally posted.

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt

with open('test.csv', 'r') as f:
    data = f.readlines()
    # ['12h35m15s,0.01t\n', '12h35m16s,0.02t\n', '12h35m17s,0.05t\n', '12h35m18s,0.15t\n', '12h35m19s,0.21t\n', '12h35m20s,0.23t\n', '12h35m21s,0.20t\n', '12h35m22s,0.21t\n', '12h35m23s,0.22t\n', '12h35m24s,0.26t']

data_to_plot = [[d.strip('t\n')] for d in data]  # Split the lines into lists and strip the 't' and new line
# [['12h35m15s,0.01'], ['12h35m16s,0.02'], ['12h35m17s,0.05'], ['12h35m18s,0.15'], ['12h35m19s,0.21'], ['12h35m20s,0.23'], ['12h35m21s,0.20'], ['12h35m22s,0.21'], ['12h35m23s,0.22'], ['12h35m24s,0.26']]

data_to_plot = [d[0].split(',') for d in data_to_plot]  # Break each observation into separate strings
# [['12h35m15s', '0.01'], ['12h35m16s', '0.02'], ['12h35m17s', '0.05'], ['12h35m18s', '0.15'], ['12h35m19s', '0.21'], ['12h35m20s', '0.23'], ['12h35m21s', '0.20'], ['12h35m22s', '0.21'], ['12h35m23s', '0.22'], ['12h35m24s', '0.26']]

x = [mdates.date2num(dt.datetime.strptime(x[0], '%Hh%Mm%Ss')) for x in data_to_plot] 
y = [y[1] for y in data_to_plot]

x_labels = [n for n in range(1, len(data_to_plot) + 1)]

fig,ax = plt.subplots(1)

ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))

ax.plot_date(x, y, '-')

plt.xticks(x, x_labels)

plt.show()

IMPORTANT: The Update 2 example will still not work unless you ensure that your source data timestamps all adhere to the ##h##m##s format. Otherwise, we'll need to add some additional code to make them uniform.

enter image description here

[For clarity, prior answers deleted. See revision history to see prior answers to the question.]

DaveL17
  • 1,673
  • 7
  • 24
  • 38
  • Hi, thanks for the replies. I tried getting Pandas to work but I keep getting numerous Python package errors which won't allow my script to run. I am therefore trying @DaveL17 suggestion. But I am coming up with an issue, I am trying to load the data out of my CSV file but I get the error "ValueError: time data 'g' does not match the format 'Hh%Mm%Ss' I cannot see though how to post my updated script on here?! – CPS13 Nov 03 '17 at 09:06
  • Please see the "UPDATED 03/11/17" section of my original post. Thanks. – CPS13 Nov 03 '17 at 09:11
  • @CPS13 We want to help you solve your problem, but it's very hard when we don't have all the information that you're working with. Please copy 10-15 rows of your *actual* data. My example won't work for you because it's written to the three column data in your example. Also, there are elements of your code that aren't relevant to solving your problem and it would be better if those were removed (there's no need to share code that writes to the file when your question is about reading data to plot). Please update and we'll take a look. – DaveL17 Nov 03 '17 at 11:38
  • I have now amended the data in the original post to show what I have in my CSV. The CSV file is titled 'log0.csv' and my code opens this file, creates a new file called 'graph.csv' where it saves the same data but strips out the 't' in the second data field. – CPS13 Nov 03 '17 at 11:59
  • I have the code working now other than pulling the time out of the CSV file. Thanks – CPS13 Nov 03 '17 at 12:23
  • @CPS13 Are you still receiving the error `ValueError: time data 'g' does not match the format 'Hh%Mm%Ss'`? This error would not be caused by the example data you posted. Take a look at your source data--specifically the time stamp column--and make sure that it always meets the structure of `##h##m##s`. Second, you shouldn't need to create a new file just to strip the `t` from the second column. You can strip it from the data as you load it with `.strip('t')`. – DaveL17 Nov 03 '17 at 13:18