3

I have data in the following format:

|      | Measurement 1 |      | Measurement 2 |      |
|------|---------------|------|---------------|------|
|      | Mean          | Std  | Mean          | Std  |
| Time |               |      |               |      |
| 0    | 17            | 1.10 | 21            | 1.33 |
| 1    | 16            | 1.08 | 21            | 1.34 |
| 2    | 14            | 0.87 | 21            | 1.35 |
| 3    | 11            | 0.86 | 21            | 1.33 |

I am using the following code to generate a matplotlib line graph from this data, which shows the standard deviation as a filled in area, see below:

def seconds_to_minutes(x, pos):
    minutes = f'{round(x/60, 0)}'
    return minutes

fig, ax = plt.subplots()
mean_temperature_over_time['Measurement 1']['mean'].plot(kind='line', yerr=mean_temperature_over_time['Measurement 1']['std'], alpha=0.15, ax=ax)
mean_temperature_over_time['Measurement 2']['mean'].plot(kind='line', yerr=mean_temperature_over_time['Measurement 2']['std'], alpha=0.15, ax=ax)

ax.set(title="A Line Graph with Shaded Error Regions", xlabel="x", ylabel="y")
formatter = FuncFormatter(seconds_to_minutes)
ax.xaxis.set_major_formatter(formatter)
ax.grid()
ax.legend(['Mean 1', 'Mean 2'])

Output:

Output Graph This seems like a very messy solution, and only actually produces shaded output because I have so much data. What is the correct way to produce a line graph from the dataframe I have with shaded error regions? I've looked at Plot yerr/xerr as shaded region rather than error bars, but am unable to adapt it for my case.

LarsaSolidor
  • 135
  • 2
  • 12

1 Answers1

6

What's wrong with the linked solution? It seems pretty straightforward.

Allow me to rearrange your dataset so it's easier to load in a Pandas DataFrame

   Time  Measurement  Mean   Std
0     0            1    17  1.10
1     1            1    16  1.08
2     2            1    14  0.87
3     3            1    11  0.86
4     0            2    21  1.33
5     1            2    21  1.34
6     2            2    21  1.35
7     3            2    21  1.33


for i, m in df.groupby("Measurement"):
    ax.plot(m.Time, m.Mean)
    ax.fill_between(m.Time, m.Mean - m.Std, m.Mean + m.Std, alpha=0.35)

enter image description here

And here's the result with some random generated data:

enter image description here

EDIT

Since the issue is apparently iterating over your particular dataframe format let me show how you could do it (I'm new to pandas so there may be better ways). If I understood correctly your screenshot you should have something like:

Measurement    1          2      
            Mean   Std Mean   Std
Time                             
0             17  1.10   21  1.33
1             16  1.08   21  1.34
2             14  0.87   21  1.35
3             11  0.86   21  1.33

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
(1, Mean)    4 non-null int64
(1, Std)     4 non-null float64
(2, Mean)    4 non-null int64
(2, Std)     4 non-null float64
dtypes: float64(2), int64(2)
memory usage: 160.0 bytes

df.columns
MultiIndex(levels=[[1, 2], [u'Mean', u'Std']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'Measurement', None])

And you should be able to iterate over it with and obtain the same plot:

for i, m in df.groupby("Measurement"):
    ax.plot(m["Time"], m['Mean'])
    ax.fill_between(m["Time"],
                    m['Mean'] - m['Std'],
                    m['Mean'] + m['Std'], alpha=0.35)

Or you could restack it to the format above with

(df.stack("Measurement")      # stack "Measurement" columns row by row
 .reset_index()               # make "Time" a normal column, add a new index
 .sort_values("Measurement")  # group values from the same Measurement
 .reset_index(drop=True))     # drop sorted index and make a new one
filippo
  • 5,197
  • 2
  • 21
  • 44
  • Ah, the problem was that the dataset is a result of a pandas .groupby.agg method, I don't know how to get get it from that structure to the one you've used to generate the graph as I want them. – LarsaSolidor Jun 01 '18 at 11:07
  • could you post a sample of the dataset in this format? just like `print(df)` prints it – filippo Jun 01 '18 at 11:10
  • I've taken a screenshot of the dataset in the format after the .groupby.agg() method: [https://i.imgur.com/JIfPYqZ.png](https://i.imgur.com/JIfPYqZ.png) – LarsaSolidor Jun 01 '18 at 11:53
  • 1
    @LarsaSolidor you could get to my format with `mean_temperature_over_time.swaplevel(0,1,axis=1).stack().reset_index().sort_values("Measurement")` or you can iterate in your dataframe with something like `for i, m in mean_temperature_over_time.groupby(level=0, axis=1): print(m[i].Mean)` – filippo Jun 01 '18 at 12:54
  • 1
    @LarsaSolidor updated the answer, see the edit please – filippo Jun 01 '18 at 21:37
  • That worked perfectly! Could you add some explanatory comments describing what you're doing? I can kind of see why it works but I don't understand enough to be confident in explaining to someone else. Thank you for your help! – LarsaSolidor Jun 04 '18 at 08:40
  • 1
    @LarsaSolidor updated the answer with a slightly easier to grasp method (no manual level addressing and swapping). The trick with concatenated pandas functions is to start from the first one, see how it reshapes the dataframe and add all the steps one by one to better understand what's going on – filippo Jun 04 '18 at 10:04