1

I would like to use Python to draw a mosaic | marimekko chart with custom colors and labels.

The following code works fine

import plotly.graph_objects as go

year = ['2019', '2020', '2021', '2022']

fig1 = go. Figure() 

fig1.add_trace(go.Bar(x=year, y=[20, 18, 14, 10], text=['20', '18', '14', '10'], name='brand 1'))
fig1.add_trace(go.Bar(x=year, y=[10, 15, 20, 22], text=['10', '15', '20', '22'], name='brand 2'))
fig1.add_trace(go.Bar(x=year, y=[6,   8, 10, 12], text=[ '6',  '8', '10', '12'], name='brand 3'))

fig1.update_layout(barmode='stack')

fig1.write_image('test_1.png')    

However, I want to sort the data for each year by the data passed via y. That means the code would look like (I'll leave out the sorting, that's not the question here).

fig2.add_trace(go.Bar(x=year, y=[20, 18, 20, 22], text=['20: brand 1', '18: brand 1', '20: brand 2', '22: brand 2']))
fig2.add_trace(go.Bar(x=year, y=[10, 15, 14, 12], text=['10: brand 2', '15: brand 2', '14: brand 1', '12: brand 3']))
fig2.add_trace(go.Bar(x=year, y=[ 6,  8, 10, 10], text=[ '6: brand 3',  '8: brand 3', '10: brand 3', '10: brand 1']))

Of course, I still want to use the same colors per brand (not per position), so in addition to the appropriately sorted data, I need to pass two more arrays for custom label texts (works fine) and for the corresponding custom colors (I don't see how to do that).

Question 1: How can I pass an array of custom colors to each trace so that each brand always gets the same color? Is there anyling like

fig1.add_trace(go.Bar(x=year, y=[20, 18, 14, 10], colors=...))

Question 2: Is there another option to create a mosaic | marimekko chart with varying x-widths which is not based on plotly?

The expected code is something like

# the color map 
the_brand_cmap = plt.get_cmap('seismic_r') 
the_brand_norm = co.TwoSlopeNorm(vmin=-max_abs, vcenter=0, vmax=max_abs)

...

for i in years: # the loop is over the years, not over the brands!

    # some more code to sort df per year and to extract the brand names and colors per year

    fig1.add_trace(go.Bar( # this adds a trace for the i-th year
        x=np.cumsum(xwidths) - xwidths,
        y=ysizes_norm, 
        width=xwidths,
        marker_color=the_brand_cmap(the_brand_norm(colors)), # the colors for each year
        text=brand_name)

The expected result is

enter image description here

TomS
  • 216
  • 1
  • 7

1 Answers1

0

I have created a Marimekko graph using your data based on the examples in the reference. Add a new column for the composition of the year. Similarly, create a column width with the total of the years. For specifying the color for each brand, create a dictionary of brands and colors and specify when creating a stacked graph with data extracted by brand.

import plotly.graph_objects as go
import numpy as np
import pandas as pd

year = ['2019', '2020', '2021', '2022']
data = {'brand 1': [20, 18, 14, 10],
       'brand 2': [10, 15, 20, 22],
       'brand 3': [6,   8, 10, 12]
       }

df = pd.DataFrame.from_dict(data)

df = df.T
df.columns = year
for c in df.columns:
    df[c+'_%'] = df[c].apply(lambda x: (x / df.loc[:,c].sum()) * 100)

widths = np.array([sum(df['2019']), sum(df['2020']), sum(df['2021']), sum(df['2022'])])
marker_colors = {'brand 1': 'darkblue', 'brand 2': 'darkgreen', 'brand 3': 'crimson'}

fig1 = go.Figure()

for idx in df.index:
    dff = df.filter(items=[idx], axis=0)
    fig1.add_trace(go.Bar(
        x=np.cumsum(widths) - widths,
        y=dff[dff.columns[4:]].values[0],
        width=widths,
        marker_color=marker_colors[idx],
        text=['{:.2f}%'.format(x) for x in dff[dff.columns[4:]].values[0]],
        name=idx
    )
)

fig1.update_xaxes(
    tickvals=np.cumsum(widths)-widths,
    ticktext= ["%s<br>%d" % (l, w) for l, w in zip(year, widths)]
)

fig1.update_xaxes(range=[0, widths])
fig1.update_yaxes(range=[0, 100])

fig1.update_layout(barmode='stack')

#fig1.write_image('test_1.png')
fig1.show()

enter image description here

Since the objective is to draw in order of increasing numerical value by year, the outer loop should loop through the years, and the inner loop should loop through the years in ascending numerical order, with the largest value coming at the top.

widths = np.array([sum(df['2019']), sum(df['2020']), sum(df['2021']), sum(df['2022'])])
marker_colors = {'brand 1': 'darkblue', 'brand 2': 'darkgreen', 'brand 3': 'crimson'}

new_widths = (np.cumsum(widths) - widths).tolist()
new_widths.append(np.cumsum(widths)[-1])

fig = go.Figure()

for i,c in enumerate(df.columns[4:]):
    dff = df[c].to_frame()
    dff.sort_values(c, ascending=True, inplace=True)
    base = [0]
    for k,br in enumerate(dff.index):
        df_br = dff.iloc[k].to_frame(br)
        # print(df_br)
        # print(widths[i])
        # print(df_br[br])
        # print(offset)
        fig.add_trace(go.Bar(
            x=[new_widths[i], new_widths[i+1]],
            y=[df_br[br][0]],
            width=widths[i],
            base=base,
            marker_color=marker_colors[br],
            text='{:.2f}%'.format(df_br[br][0]),
            name=br
        ))
        base += df_br[br][0]

        names = set()
fig.for_each_trace(
    lambda trace:
        trace.update(showlegend=False)
        if (trace.name in names) else names.add(trace.name))

fig.update_xaxes(
    tickvals=np.cumsum(widths)-widths,
    ticktext= ["%s<br>%d" % (l, w) for l, w in zip(year, widths)]
)

fig.update_xaxes(range=[0, widths])
fig.update_yaxes(range=[0, 100])

fig.update_layout(barmode='stack')
fig.show()

enter image description here

r-beginners
  • 31,170
  • 3
  • 14
  • 32
  • Thanks. My problem is more involved b/c I want to sort the data per column by size. I have a solution, but there I need to draw all boxes individually. That's why I am asking for a solution that allows for an **array of custom colors to each trace**. So in your case the code should look like fig1.add_trace(go. Bar(x=..., y=..., width=..., **marker_color=marker_colors**, ...) where **marker_colors** is an **array of colors**. That's my question. – TomS Apr 18 '23 at 06:59
  • I enhanced your code and it seems that this array of colors is indeed allowed by the go.Bar() function. – TomS Apr 18 '23 at 07:05
  • My answer is that the same color is set for each brand. Is your comment that you want to have different colors for the same brand in different years? If so, I would add a loop process. If my understanding is different, could you please elaborate? You can also edit with my output graph and update your question. – r-beginners Apr 18 '23 at 09:01
  • The brand-color is the same for every year, but the traces are not added-brand-wise. For each year I sort the df acc. to the brand-size. That means that the first trace corresponds to the largest value per year (which does not always come from the same brand, the second trace corresponds to the second largest value per year and so on. I'll enhance my question. – TomS Apr 18 '23 at 10:32
  • I think we can draw a graph for the year columns, starting with the smallest number, in a loop process, have a dictionary of the colors of the brands we want to color-code, and specify the colors of the markers. The outer loop process is the year and the inner loop is the ascending order of the year. – r-beginners Apr 18 '23 at 11:57
  • This is what I did using matplotlib. I am using an outer year-loop, then do the sorting, then the inner brand-loop to draw the boxes. Unfortunately, I don't see how to get rid of one loop and have both varying heights and width. Then I found out that pyplot provides this functionality, but it seems that I am now stuck elsewhere ... Anyway, I have a working code, and switching to another library to save a few lines of code is not my first priority. – TomS Apr 18 '23 at 12:25