0

I'm new to pandas and have difficulties in using its power in a convenient way.

I have a large dataframe with experimental data for two different tests which I'd like to compare. Ideally, the data is displayed in a plot.

## what I have:
import pandas as pd

ids = [
    'Bob','Bob',
    'John', 'John',
    'Mary', 'Mary',
    ]
var = [
    'a', 'b',
    'a', 'b',
    'a', 'b',
    ]
data = [
    10,11,
    15,14,
    10,15
    ]
dataset = zip(ids, var, data)
print dataset

columns = ['ids', 'var', 'data']
df = pd.DataFrame(data = dataset, columns=columns)
print df

## what I want:
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator

fig = plt.figure()
ax1 = fig.add_subplot(111)
for i,ii in enumerate(ids):
    if var[i] == 'a':
        ax1.plot(i/2, data[i], 'rs', label='var a')
    else:
        ax1.plot((i-1)/2, data[i], 'bo', label='var b')
majorLocator = MultipleLocator(1)
ax1.xaxis.set_major_locator(majorLocator)
ax1.grid()
ax1.margins(0.05)
ax1.set_xlabel('ids')
ax1.set_ylabel('data')
ax1.legend(loc='best', numpoints=1)
fig.show()

How can I do this properly without many many nested for loops? A plus would be if I could use the ids as the xlabels...

Thanks a lot in advance, Daniel

damada
  • 95
  • 8

2 Answers2

1

I'm not quite sure what you want end-goal wise, but if cphlewis's suggestion to go with seaborn isn't what you were looking for, you might try converting your DataFrame to a multiindex, instead, and plotting it out that way.

mi = pd.DataFrame(data=data,index=[ids,var],columns=['data'])
f, a = plt.subplots()
mi.plot(kind='bar',ax=a)

multiindex plot results

It might also be helpful to reference this post.

Community
  • 1
  • 1
andrewgcross
  • 253
  • 2
  • 13
0

seaborn does a lot of this for you, very flexibly:

import seaborn as sns
sns.factorplot('ids', 'data', hue='var', kind='bar', data=df)

enter image description here

(it also restyles the plotting defaults, which can be changed or reset).

If you want to subset the data, pass the subset as the data argument:

sns.factorplot('ids', 'data', hue='var', kind='bar', 
               data=df[df.isin({'ids':['Bob','Mary']}).any(1)])

enter image description here

  • that's with sns style turned off
  • for any more complicated mask, you'd set up the mask separately; see the pandas docs
cphlewis
  • 15,759
  • 4
  • 46
  • 55
  • This seems like exactly what I need and want plotwise! Now, how can I combine this with the data filtering capabilities of pandas? For example, I only want to plot vars a and b, not c. Would I need to change the dataframe before plotting? – damada Apr 18 '15 at 19:29
  • yes, seaborn and pandas work very well together -- see http://pandas.pydata.org/pandas-docs/stable/indexing.html for MANY MANY ways to subset `df`. – cphlewis Apr 18 '15 at 19:44
  • (and I put a filtered example into the answer) – cphlewis Apr 18 '15 at 21:28
  • Fantastic! I've just realized my Pandas module is outdated, too, thanks to your example. – damada Apr 20 '15 at 09:52