1

I have a pandas dataframe that roughly looks like this:

df = pd.DataFrame(data, columns=["property_1", "property_2", "value"], index=my_index)

my_index    property_1    property_2    value
<1, 1, 1>   "A"           "X"           ...
<1, 1, 1>   "A"           "Y"           ...
<1, 1, 2>   "A"           "X"           ...
<1, 1, 4>   "A"           "X"           ...
<1, 1, 4>   "A"           "Y"           ...
<1, 1, 4>   "B"           "X"           ...
<1, 1, 4>   "B"           "Y"           ...

I'm wanting to produce a grouped bar chart like this:

desired group bar chart

This is quite complicated, but basically:

  1. I need to reduce my_index to the unique indices that have a value for each combination of property_1 and property_2
  2. I need to find the unique combinations of property_1 AND property_2, not just the unique values of each column individually!
  3. I am trying to group them primarily by my_index, and then by the combination of property_1 and property_2

I would have guessed that the way of going about this is by having a dataframe with the following layout:

my_index    A-X    A-Y    B-X    B-Y    ... 
<1, 1, 1>   ...    ...    NaN    NaN    ...
<1, 1, 2>   ...    ...    NaN    NaN    ...

And so on. Then one could drop the columns with any NaN values in it. You could then just call df.plot.bar(...) on that resulting dataframe.

But I am not sure how to rearrange these rows into the columns in this way. Does anyone have any ideas?

EDIT: I should note that I don't need an answer in pandas, I am just asking if there is one. If not, I can wrangle the data out myself. But maybe pandas has a nifty one-liner for making this kind of work easier.

Daniel Soutar
  • 827
  • 1
  • 10
  • 24
  • Please include a sample for your *data* object for a [MCVE]. See also [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Parfait Oct 05 '18 at 15:32

1 Answers1

0

I might be wrongly understood your problem. But let me suggest some steps that might help you out.

First, add one column unique_property from columns property_1 and property_2, and (if needed) drop the two latter.

df[`unique_property`] = df.property_1 + df.property_2
df.drop(['property_1', 'property_2'], axis=1, inplace=True)

Then we can plot the dataframe, grouped by my_index and unique_property.

fig, ax = plt.subplots()
ax.set_xticks(df['my_index'].unique()) # this sets x axis.
df.groupby(["my_index", "unique_property"]).mean()['value'].unstack().plot(kind='bar', ax=ax)

Explanation for the last line.

df.groupby(["my_index", "unique_property"]).mean()['value']

Above code will give us Series of value, grouped by my_index and unique_property. If we directly plot it, then we will get all combination of unique values in (my_index, unique_property) as x axis. That's why we need to

unstack() it.

It's like the unique values in my_index become rows and the unique values in unique_property become columns.

By default, it will produce NaN value for incomplete data, e.g. for my_index = <1,1,1> there are only AX and AY that have value, then BX and BY will be filled by NaN value. We can for example replace NaN by some_value, then unstack(fill_value=some_value).

ipramusinto
  • 2,310
  • 2
  • 14
  • 24