I recently decided to give matplotlib.pyplot a try, while having used gnuplot for scientific data plotting for years. I started out with simply reading a data file and plot two columns, like gnuplot would do with plot 'datafile' u 1:2
.
The requirements for my comfort are:
- Skip lines beginning with a
#
and skip empty lines. - Allow arbitrary numbers of spaces between and before the actual numbers
- allow arbitrary numbers of columns
- be fast
Now, the following code is my solution for the problem. However, compared to gnuplot, it really is not as fast. This is a bit odd, since I read that one big advantage of py(plot/thon) over gnuplot is it's speed.
import numpy as np
import matplotlib.pyplot as plt
import sys
datafile = sys.argv[1]
data = []
for line in open(datafile,'r'):
if line and line[0] != '#':
cols = filter(lambda x: x!='',line.split(' '))
for index,col in enumerate(cols):
if len(data) <= index:
data.append([])
data[index].append(float(col))
plt.plot(data[0],data[1])
plt.show()
What would I do to make the data reading faster? I had a quick look at the csv
module, but it didn't seem to be very flexible with comments in files and one still needs to iterate over all lines in the file.