Subplotting of Pandas.DataFrameGroupBy[group_name] does not yield expected results

Question

This is a re-opening of my initial question with the same title which was closed as duplicate. As None of the suggested duplicates helped me to solve my problem, I post this question again.

I have a DataFrame with time series related to some devices which come from a hdf-file:

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from pandas import DataFrame


def open_dataset(file_name: str, name: str, combined_frame: DataFrame):

data_set: DataFrame = pd.read_hdf(file_name, key=name)
data_set['name'] = name
combined_frame = pd.concat([combined_frame, data_set], axis=0)
return combined_frame


if __name__ == '__main__':

    names = ['YRT1IN1E', 'YRT1LE1', 'YRT1MH1', 'YR08DT1ML']

    working_frame = DataFrame()

    for name in names:
        working_frame = open_dataset('data.h5', name, working_frame)

    grouped_frame = working_frame.groupby('name')


    fig, axs = plt.subplots(figsize=(10, 5),
                        nrows=4, ncols=1,  # fix as above
                        gridspec_kw=dict(hspace=0), sharex=True)

    axs = grouped_frame.get_group('YR08DT1ML').rawsum.plot()
    axs = grouped_frame.get_group('YRT1LE1').voltage.plot()
    axs = grouped_frame.get_group('YRT1MH1').current.plot()
    axs = grouped_frame.get_group('YRT1IN1E').current.plot()

    plt.show()

This produces the following output:

What am I doing wrong? I would like to have each of the plots in it's own row, not all in one row.

The data file "data.h5" is available at: Google Drive

What I tried from the suggested posts:

Answer by joris, Mar 18, 2014 at 15:45 causes code to go into infinite loop, data is never plotted:

fig, axs = plt.subplots(nrows=2, ncols=2)
grouped_frame.get_group('YR08DT1ML').rawsum.plot(ax=axs[0,0])
grouped_frame.get_group('YR...').rawsum.plot(ax=axs[0,1])
grouped_frame.get_group('YR...').rawsum.plot(ax=axs[1,0])
grouped_frame.get_group('YR...').rawsum.plot(ax=axs[1,1])

A variation is leading to same result as I described above:

axes[0,0] = grouped_frame.get_group('YR08DT1ML').rawsum.plot()
axes[0,1] = grouped_frame.get_group('YR...').rawsum.plot()
...

Infinite loop happens as well for sedeh's, Jun 4, 2015 at 15:26 answer:

grouped_frame.get_group('YR08DT1ML').rawsum.plot(subplots=True, layout=(1,2))
...

Infinite loop happens as well for Justice_Lords, Mar 15, 2019 at 7:26 answer:

fig=plt.figure()
ax1=fig.add_subplot(4,1,1)
ax2=fig.add_subplot(4,1,2)
ax3=fig.add_subplot(4,1,3)
ax4=fig.add_subplot(4,1,4)

grouped_frame.get_group('YR08DT1ML').rawsum.plot(ax=ax1)
...

It seems to me that the problem is related to the fact that I plot with a pandas.DataFrameGroupBy and not a pandas.DataFrame

fishmulch · Accepted Answer · 2022-07-16T20:27:39.283

1

Seems like matplotlib was taking a long time to process the DatetimeIndex. Converting to a time and cleaning everything up did the trick:

names = ['YR08DT1ML', 'YRT1LE1', 'YRT1MH1', 'YRT1IN1E']
df = pd.concat([pd.read_hdf('data.h5', name) for name in names])

df.reset_index(inplace=True)
df.index = df['time'].dt.time
df.sort_index(inplace=True)

fig, axes = plt.subplots(figsize=(10, 5), nrows=4, ncols=1, gridspec_kw=dict(hspace=0), sharex=True)

cols = ['rawsum', 'voltage', 'current', 'current']

for ix, name in enumerate(names):
    df.loc[df['nomen'].eq(name), cols[ix]]\
        .plot(ax=axes[ix])

plt.show();

Hope this helps.

edited Jul 16 '22 at 20:27

answered Jul 16 '22 at 16:41

fishmulch

355
1
8

thanks for your answer. Did you try to get a full example running with my code + data? I don't get it to work, I only get a plot with four empty lanes. As well I don't understand, why the aggregate function is needed, where I'd like to plot the raw data. Sorry, as it seems I need to boost my DataFrame skills a bit ;) – WolfiG Jul 16 '22 at 19:55
There were a bunch of little things so not summarizing but hopefully the functional code above helps. Sorry for posting slightly untested code earlier ;) – fishmulch Jul 16 '22 at 20:28
1

that definitively helped - thanks – WolfiG Jul 16 '22 at 21:42

score 0 · Answer 2 · answered Jul 18 '22 at 08:28

Thanks to @fishmulch's answer I found a way to do what I wanted. However, I want to provide an answer for my initial question how to plot the "groupby" data set. The following __main__ function provides the desired output with input file data.h5:

if __name__ == '__main__':
    names = ['YRT1IN1E', 'YRT1LE1', 'YRT1MH1', 'YR08DT1ML']

    working_frame = DataFrame()
    for name in names:
        working_frame = open_dataset('data.h5', name, working_frame)

    grouped_frame = working_frame.groupby('name')

    fig = plt.figure(1)
    gs = gridspec.GridSpec(4, 1)
    gs.update(wspace=0.0, hspace=0.0)  # set the spacing between axes.

    cols = ['current', 'voltage', 'current', 'rawsum']

    row = 0
    for name, col in zip(names, cols):
        data = grouped_frame.get_group(name)
        if row == 0:
            ax = fig.add_subplot(gs[row])
        else:
            ax = fig.add_subplot(gs[row], sharex=ax)
        ax.plot(data.get(col))
        row += 1

    plt.show()

... some beautification still needed ...

Subplotting of Pandas.DataFrameGroupBy[group_name] does not yield expected results

2 Answers2