1

I have a time-sampled data set with essentially a two-column index (timestamp, ID). However, some timestamps do not have a sample point for a given index.

How can I make a stackplot with Matplotlib for this kind of data?

import pandas as pd
import numpy as np
import io
import matplotlib.pyplot as plt

df = pd.read_csv(io.StringIO('''
A,B,C
1,1,0
1,2,0
1,3,0
1,4,0
2,1,.5
2,2,.2

2,4,.15
3,1,.7

3,3,.1
3,4,.2
'''.strip()))

b = np.unique(df.B)
plt.stackplot(np.unique(df.A),
              [df[df.B==_b].C for _b in b],
              labels=['B:{0}'.format(_b) for _b in b],
)
plt.xlabel('A')
plt.ylabel('C')
plt.legend(loc='upper left')
plt.show()

When I try this program, Python replies:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

When I manually fill in the missing data points (see blank lines in string literal), the plot works fine.

enter image description here

Is there a straightforward way to "insert" zero records for missing sample data (like this question, but I have two columns functioning as indices, and I don't know how to adapt the solution to my problem) or have Matplotlib plot with holes?

Community
  • 1
  • 1
mojo
  • 4,050
  • 17
  • 24

1 Answers1

2

You could use df.pivot to massage the DataFrame into a form amenable to calling DataFrame.plot(kind='area'). For example, if

In [46]: df
Out[46]: 
   A  B     C
0  1  1  0.00
1  1  2  0.00
2  1  3  0.00
3  1  4  0.00
4  2  1  0.50
5  2  2  0.20
6  2  4  0.15
7  3  1  0.70
8  3  3  0.10
9  3  4  0.20

then

In [47]: df.pivot(columns='B', index='A')
Out[47]: 
     C                
B    1    2    3     4
A                     
1  0.0  0.0  0.0  0.00
2  0.5  0.2  NaN  0.15
3  0.7  NaN  0.1  0.20

Notice that df.pivot fills in the missing NaN values for you. Now, with the DataFrame in this form,

result.plot(kind='area')

produces the desired plot.


import pandas as pd
import numpy as np
import io
import matplotlib.pyplot as plt

try:
    # for Python2
    from cStringIO import StringIO 
except ImportError:
    # for Python3
    from io import StringIO


df = pd.read_csv(StringIO('''
A,B,C
1,1,0
1,2,0
1,3,0
1,4,0
2,1,.5
2,2,.2

2,4,.15
3,1,.7

3,3,.1
3,4,.2
'''.strip()))


result = df.pivot(columns='B', index='A')
result.columns = result.columns.droplevel(0)
# Alternatively, the above two lines are equivalent to
# result = df.set_index(['A','B'])['C'].unstack('B')

ax = result.plot(kind='area')
lines, labels = ax.get_legend_handles_labels()
ax.set_ylabel('C')
ax.legend(lines, ['B:{0}'.format(b) for b in result.columns], loc='best')

plt.show()

yieldsenter image description here

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677