I'm new to pandas and have difficulties in using its power in a convenient way.
I have a large dataframe with experimental data for two different tests which I'd like to compare. Ideally, the data is displayed in a plot.
## what I have:
import pandas as pd
ids = [
'Bob','Bob',
'John', 'John',
'Mary', 'Mary',
]
var = [
'a', 'b',
'a', 'b',
'a', 'b',
]
data = [
10,11,
15,14,
10,15
]
dataset = zip(ids, var, data)
print dataset
columns = ['ids', 'var', 'data']
df = pd.DataFrame(data = dataset, columns=columns)
print df
## what I want:
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
fig = plt.figure()
ax1 = fig.add_subplot(111)
for i,ii in enumerate(ids):
if var[i] == 'a':
ax1.plot(i/2, data[i], 'rs', label='var a')
else:
ax1.plot((i-1)/2, data[i], 'bo', label='var b')
majorLocator = MultipleLocator(1)
ax1.xaxis.set_major_locator(majorLocator)
ax1.grid()
ax1.margins(0.05)
ax1.set_xlabel('ids')
ax1.set_ylabel('data')
ax1.legend(loc='best', numpoints=1)
fig.show()
How can I do this properly without many many nested for loops? A plus would be if I could use the ids as the xlabels...
Thanks a lot in advance, Daniel