Recording headers in text file and making plots with subsequent data

Question

I am having trouble parsing a text file that I created with another program. The text file looks something like this:

velocity 4
0 0
0.0800284750334461 0.0702333599787275
0.153911082737118 0.128537103048848
0.222539323234924 0.176328826156044
0.286621942300277 0.21464146333504
0.346732028739683 0.244229944930359
0.403339781262399 0.265638972071027
...
velocity 8
0 0
0.169153136373962 0.124121036173475
0.312016311613761 0.226778846267302
0.435889653693839 0.312371513797743
0.545354054604357 0.383832483710643
0.643486956562741 0.443203331839287
...

I want to grab the number in the same row as velocity (the header) and save it as the title of the plot of the subsequent data. Every other row apart from the header represents the x and y coordinates of a shooting ball.

So if I have five different headers, I would like to see five different lines on a single graph with a legend displaying the different velocities.

Here is my python code so far. I am close to what I want to get, but I am missing the first set of data (velocity = 4 m/s) and the colors on my legend don't match the line colors.

import matplotlib.pyplot as plt

xPoints = []
yPoints = []
fig, ax = plt.subplots()

with open('artilleryMotion.txt') as inf:

    for line in inf:
        column = line.split()

        if line.startswith("v"):
            velocity = column[1]
            ax.plot(xPoints, yPoints, label = '%s m/s' % velocity)
        else:
            xPoints.append(column[0])
            yPoints.append(column[1])

ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)

I have been struggling with this for a while.

Edit_1: This is my output at the moment:

Artillery motion plot

Edit_2: I removed the indentation of the last lines of code. The color problem still occurs.

Edit_3: How would I go about saving the x and y points to a new array for each velocity? This may solve my issues.

Edit_4: Thanks to Charles Morris, I was able to create these plots. I just need to now determine if the initial upwards "arcing" motion by the ping pong ball for the higher velocities is representative of the physics or is a limitation of my code.

Artillery Motion Final

Take your `ax.legend() ... ax.set_auto...` outside of your for loop for starters. It should be at the same indentation as `fig, ax...` (in this case - not indented at all). — Chuck, Feb 04 '17 at 20:35
I removed the indentation, but it did not rectify the color problem. It shouldn't be indented anyway, so thanks for catching that. — Florent H, Feb 04 '17 at 20:43
The first time you iterate over all lines in the file, your `xPoints` and `yPoints` arrays are empty. Therefore, when you try and plot values for v = 4, you are plotting an empty array - hence your missing line. You need to populate the arrays first, and then plot them. At the moment, you are plotting the values for v = 4 in the line labelled v = 8, and for v = 8, the values for v = 16 and so on. — Chuck, Feb 04 '17 at 20:58
You also plot all the points individually - it would be better to each "set" of values in different arrays. — Chuck, Feb 04 '17 at 21:00
@CharlesMorris That is exactly what I was thinking as well, but I don't know how to populate the `xPoints` and `yPoints` arrays before I first say `velocity = column[1]`. Otherwise, I get an error saying _could not convert string to float: velocity_. I was thinking of using a counter starting at `i = 0` and only executing the code I have above if `i >= 1`, but I am really struggling with the logic flow. — Florent H, Feb 04 '17 at 21:22
@CharlesMorris I like your last comment. Please see my last edit in my post. — Florent H, Feb 04 '17 at 21:24
`could not convert string to float: velocity` would be because you try and plot the string `"velocity"` when it expects a floating point number. Are the number of points always the same for each velocity? — Chuck, Feb 04 '17 at 21:36
Rather than reading line by line, look into `genfromtxt()` or `np.loadtxt` — Chuck, Feb 04 '17 at 21:41
To continue in the manner you are using, save your data to a dictionary whereby the key indicates which velocity your values correspond to. Then afterwards, you can plot each set of data by specifying which key to plot by. — Chuck, Feb 04 '17 at 21:55
Thank you Charles. I will look into `genfromtxt()`, `np.loadtxt`, and dictionaries. I have no experience in either, but hopefully it isn't too complicated. And no, each velocity has a different number of data points. — Florent H, Feb 04 '17 at 22:02

score 1 · Accepted Answer · edited May 23 '17 at 12:24

Edit: Ignore the old information, and see Solved solution below:

The following code works an example text file: input.txt

velocity 4
0 0
0.0800284750334461 0.0702333599787275
0.153911082737118 0.128537103048848
0.222539323234924 0.176328826156044
0.286621942300277 0.21464146333504
0.346732028739683 0.244229944930359
0.403339781262399 0.265638972071027
velocity 8
0 0
0.169153136373962 0.124121036173475
0.312016311613761 0.226778846267302
0.435889653693839 0.312371513797743
0.545354054604357 0.383832483710643
0.643486956562741 0.443203331839287

1) Import our text file

We use np.genfromtxt() for imports. In this case, we can Specify that dtype = float. This has the effect that the affect that Numbers are imported as 'Float' and thus, strings (in this case 'Velocity'), are imported as NaN.

Source: https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html How to use numpy.genfromtxt when first column is string and the remaining columns are numbers?

from matplotlib import pyplot as plt
from itertools import groupby
from numpy import NaN as nan

A = np.genfromtxt('input.txt',dtype=float)


>>>
array([[        nan,  4.        ],
       [ 0.        ,  0.        ],
       [ 0.08002848,  0.07023336],
       [ 0.15391108,  0.1285371 ],
       [ 0.22253932,  0.17632883],
       [ 0.28662194,  0.21464146],
       [ 0.34673203,  0.24422994],
       [ 0.40333978,  0.26563897],
       [        nan,  8.        ],
       [ 0.        ,  0.        ],
       [ 0.16915314,  0.12412104],
       [ 0.31201631,  0.22677885],
       [ 0.43588965,  0.31237151],
       [ 0.54535405,  0.38383248],
       [ 0.64348696,  0.44320333]])

2) Slice the imported array `A`

We can slice these arrays into separate X and Y arrays representing our X and Y values. Read up on array slicing in numpy here: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

In this case, we take all values with index = 0 (X) and all values with index 1 (Y):

# x values
# y values   
X = A[:,0]
Y = A[:,1]    

>>> X = array([        nan,  0.        ,  0.08002848,  0.15391108,  0.22253932,
    0.28662194,  0.34673203,  0.40333978,         nan,  0.        ,
    0.16915314,  0.31201631,  0.43588965,  0.54535405,  0.64348696])

>>> Y = array([ 4.        ,  0.        ,  0.07023336,  0.1285371 ,  0.17632883,
    0.21464146,  0.24422994,  0.26563897,  8.        ,  0.        ,
    0.12412104,  0.22677885,  0.31237151,  0.38383248,  0.44320333])

3) Split the data for each velocity.

Here we desire to separate our X and Y values into those for each Velocity. Our X values are separated by Nan and our Y values are separated by 4,8,16....

Thus: For x, we split by nan. nan is a result of the genfromtxt() parsing Velocity as a float and returning nan.

Sources: numpy: split 1D array of chunks separated by nans into a list of the chunks Split array at value in numpy

For y, we split our array up on the numbers 4,8,16 etc. To do this, we exclude values that, when divided by 4, have zero remainder (using the % Python operator).

Sources: Split array at value in numpy How to check if a float value is a whole number Split NumPy array according to values in the array (a condition) Find the division remainder of a number How do I use Python's itertools.groupby()?

XX = [list(v) for k,v in groupby(X,np.isfinite) if k]
YY = [list(v) for k,v in groupby(Y,lambda x: x % 4 != 0 or x == 0) if k]


>>> 
XX = [[0.0,
0.080028475033446095,
0.15391108273711801,
0.22253932323492401,
0.28662194230027699
0.34673202873968301,
0.403339781262399],
[0.0,
0.16915313637396201,
0.31201631161376098,
0.43588965369383897,
0.54535405460435704,
0.64348695656274102]]

>>> YY =
[[0.0,
0.070233359978727497,
0.12853710304884799,
0.17632882615604401,
0.21464146333504,
0.24422994493035899,
0.26563897207102699],
[0.0,
0.124121036173475,
0.22677884626730199,
0.31237151379774297,
0.38383248371064299,
0.44320333183928701]]

4) Extract labels

Using a similar technique as above, we accept values = to our velocities 4,8,16 etc. In this case, we accept only those numbers which, when divided by 4, have 0 remainder, and are not 0. We then convert to a string and add m/s.

Ylabels = [list(v) for k,v in groupby(Y,lambda x: x % 4 == 0 and x != 0) if k]
Velocities = [str(i[0]) + ' m/s' for i in Ylabels]

>>> Y labels = [[4.0], [8.0]]
>>> Velocities = ['4.0 m/s', '8.0 m/s']

5) Plot

Plot values by index for each velocity.

fig, ax = plt.subplots()
for i in range(0,len(XX)):
    plt.plot(XX[i],YY[i],label = Velocities[i])
ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)

Code Altogether:

import numpy as np
from matplotlib import pyplot as plt
from itertools import groupby
from numpy import NaN as nan

A = np.genfromtxt('input.txt',dtype=float)

X = A[:,0]
Y = A[:,1]    

Ylabels = [list(v) for k,v in groupby(Y,lambda x: x % 4 == 0 and x != 0) if k]
Velocities = [str(i[0]) + ' m/s' for i in Ylabels]

XX = [list(v) for k,v in groupby(X,np.isfinite) if k]
YY = [list(v) for k,v in groupby(Y,lambda x: x % 4 != 0 or x == 0) if k]

fig, ax = plt.subplots()
for i in range(0,len(XX)):
    plt.plot(XX[i],YY[i],label = Velocities[i])
ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)

Old Answer:

The first time you iterate over all lines in the file, your xPoints and yPoints arrays are empty. Therefore, when you try and plot values for v = 4, you are plotting an empty array - hence your missing line.

You need to populate the arrays first, and then plot them. At the moment, you are plotting the values for v = 4 in the line labelled v = 8, and for v = 8, the values for v = 16 and so on.

Ignore: For the array population, try the following:

xPoints = []
yPoints = []
with open('artilleryMotion.txt') as inf:
    # initialize placeholder velocity variable
    velocity = 0
    for line in inf:
        column = line.split()

        if line.startswith("v"):
            velocity = column[1]

        else:
            xPoints.append({velocity: column[0]})
            yPoints.append({velocity: column[1]})

In the above, you save the data as a list of dictionaries (separate for x and y points), where the key is equal to the velocity that has been read in most recently, and the values are the x and y coordinates.

As a new velocity is read in, the placeholder variable velocity is updated and so the x and y values can be identified according the key that they have.

This allows you to Seaprate your plots by dictionary key (look up D.iteritems() D.items() ) and you can plot each set of points individually.

Thanks for the great answer. I am fairly new to programming, so I am having a lot of trouble figuring out how to iterate through the list of dictionaries and create the plots. Could you provide any help here? — Florent H, Feb 05 '17 at 00:35
Sorry for taking so long to get back to you. Yes, that is perfect! Thanks so much for your help. I still have a lot to learn in Python! — Florent H, Feb 10 '17 at 05:05
@FlorentH Glad to have helped - you are very welcome. This was a tough one to work through and I too have a lot to learn in Python, it's part of the beauty of it I suppose: we'll all get there! :) Also, If my answer solved your problem, click the big checkbox to accept it as the answer + also consider voting it up. See http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work. — Chuck, Feb 10 '17 at 08:52
Done and Done! Since I have less than 15 reputation, my upvote is not publicly visible, but it said that it's still recorded. — Florent H, Feb 10 '17 at 22:10
See attached picture in original post to see what your code made! — Florent H, Feb 10 '17 at 22:26
@FlorentH Ah, that looks brilliant! Great job getting it all integrated properly - it can still be difficult to merge two separate code pieces together. Nice work :) — Chuck, Feb 11 '17 at 13:06