2

This is starting to bug me: In plotly express when using animation_frame, I know it's important to set ranges so data can be displayed consistently, otherwise data may vanish across frames. But for a column with categorical values (say 'US', 'Russia', 'Germany'), I cannot find any way to avoid disappearing data when not every frame contains all categories if I want that column to appear with different colors (in the code below, that column would be 'AnotherColumn'). Plotly documentation points out

Animations are designed to work well when each row of input is present across all animation frames, and when categorical values mapped to symbol, color and facet are constant across frames. Animations may be misleading or inconsistent if these constraints are not met.

but while I can easily set a range_color when I have a continuous color range, nothing of the sort seems to work for categorical data. I can somewhat workaround this by making my data numerical (e.g. 'US'-> 1, 'Russia' -> 2) bu that is both fiddly and the result visually unappealing.

import plotly.express as px

... 

fig = px.bar(data, x="NameColumn",
             y="SomeColumn",
             color="AnotherColumn",
             animation_frame="AnimationColumn",
             range_y=[0, max_y]
             )

Here is a simple reproducible example:

import pandas as pd
import plotly.express as px


data_dict = {'ColorColumn': ['p', 'p', 'p', 'q'],
             'xColumn': ['someName', 'someOtherName', 'someName', 'someOtherName'],
             'yColumn': [10, 20, 30, 40],
             'animationColumn': [1, 1, 2, 2]}

data = pd.DataFrame(data=data_dict)

fig = px.bar(data, x="xColumn",
             y="yColumn",
             color="ColorColumn",
             animation_frame="animationColumn",
             range_y=[0, 40]
             )
fig.update_layout(xaxis={'title': '',
                        'visible': True,
                        'showticklabels': True})



fig.show()

If you try it out, you'll notice the second frame is missing a bar. If the ColorColumn had numeric data, you could fix this by specifying range_color (similar to the specification of range_y in the code above); my question would be, how to handle this with categorical data?

Second edit: Some requested additional data or more a more reasonable example. This might be more appropriate:

import pandas as pd
import plotly.express as px


data_dict = {'Region': ['North America', 'Asia', 'Asia',
                        'North America', 'Asia', 'Europe',
                        'North America', 'Europe', 'Asia'],
             'Country': ['US', 'China', 'Korea',
                         'US', 'Phillipines', 'France',
                         'Canada', 'Germany', 'Thailand'],
             'GDP': [10, 20, 30,
                     40, 50, 60,
                     70, 80, 90],
             'Year': [2017, 2017, 2017,
                      2018, 2018, 2018,
                      2019, 2019, 2019]}

data = pd.DataFrame(data=data_dict)

fig = px.bar(data, x="Country",
             y="GDP",
             color="Region",
             animation_frame="Year",
             range_y=[0, 80]
             )
fig.update_layout(xaxis={'title': '',
                        'visible': True,
                        'showticklabels': True})



fig.show() 
vestland
  • 55,229
  • 37
  • 187
  • 305
Joseph Doob
  • 163
  • 15

2 Answers2

1

The following is not a direct answer to you question (as in what do i need to change in plotly), but rather focuses on consistent data in you DataFrame.

The basic idea is that the "Primary Key" of each of your rows in the second example is ["Year", "Country"]. plotly will now expect a value for "GDP" as well as "Region" for each combination of those. The following creates a DataFrame that looks just like so (be using a MultiIndex reindexing).

unqiue_years = data["Year"].unique()
unqiue_countries = data["Country"].unique()

# Let's first separate the region of a country
region_per_country = data[["Country", "Region"]].drop_duplicates().set_index("Country") 

# Removing the region
data = data[["Year", "Country", "GDP"]].set_index(["Year", "Country"]) 
 
# Creating all possible "Year" "Country" combinations
data = data.reindex(pd.MultiIndex.from_product([unqiue_years, unqiue_countries]))  

# Cleanup
data = data.reset_index().rename(columns={"level_0": "Year", "level_1": "Country"})  

# Re-adding the region
data = data.merge(region_per_country, left_on="Country", right_index=True)

Running this gives us the following DataFrame (shown without the .reset_index()):

                   GDP         Region
Year Country                         
2017 Canada        NaN  North America
     China        20.0           Asia
     France        NaN         Europe
     Germany       NaN         Europe
     Korea        30.0           Asia
     Phillipines   NaN           Asia
     Thailand      NaN           Asia
     US           10.0  North America
2018 Canada        NaN  North America
     China         NaN           Asia
     France       60.0         Europe
     Germany       NaN         Europe
     Korea         NaN           Asia
     Phillipines  50.0           Asia
     Thailand      NaN           Asia
     US           40.0  North America
2019 Canada       70.0  North America
     China         NaN           Asia
     France        NaN         Europe
     Germany      80.0         Europe
     Korea         NaN           Asia
     Phillipines   NaN           Asia
     Thailand     90.0           Asia
     US            NaN  North America

which plotly will then correctly plot.

BStadlbauer
  • 1,287
  • 6
  • 18
1

A similar question has been asked and answered under Plotly: How to specify categorical x-axis elements in a plotly express animation?. The necessary adjustments for your use case aren't exactly straight-forward though, so I'll might as well set it up for you.

It all boils down to this setup using, among other things:

df['key']=df.groupby(['Year','Country']).cumcount()
df1 = pd.pivot_table(df,index='Year',columns=['key', 'Country'],values='GDP')

And:

df1 = pd.merge(df1, data[['Country', 'Region']], how='left', on='Country').drop_duplicates()

Using some neat properties of pd.pivot_table, this will give you a dataset that has all years and all countries for all regions even though GDP from these have not been specified.

The two first animation frames will look like this:

enter image description here

enter image description here

Complete code:

import pandas as pd
import plotly.express as px


data_dict = {'Region': ['North America', 'Asia', 'Asia',
                        'North America', 'Asia', 'Europe',
                        'North America', 'Europe', 'Asia'],
             'Country': ['US', 'China', 'Korea',
                         'US', 'Phillipines', 'France',
                         'Canada', 'Germany', 'Thailand'],
             'GDP': [10, 20, 30,
                     40, 50, 60,
                     70, 80, 90],
             'Year': [2017, 2017, 2017,
                      2018, 2018, 2018,
                      2019, 2019, 2019]}

data = pd.DataFrame(data=data_dict)

# dat munging
df = data.copy()
df['key']=df.groupby(['Year','Country']).cumcount()

df1 = pd.pivot_table(df,index='Year',columns=['key', 'Country'],values='GDP')
df1 = df1.stack(level=[0,1],dropna=False).reset_index()

df1 = pd.merge(df1, data[['Country', 'Region']], how='left', on='Country').drop_duplicates()
df1.columns=['Year', 'Key', 'Country', 'GDP', 'Region']

fig = px.bar(df1, x="Country",
             y="GDP",
             color="Region",
             animation_frame="Year",
             range_y=[0, 80]
             )
fig.update_layout(xaxis={'title': '',
                        'visible': True,
                        'showticklabels': True})

fig.show()
vestland
  • 55,229
  • 37
  • 187
  • 305
  • Thanks. Both answers were very well thought out and presented. I accepted yours, but I think they're equally good. Ultimately, plotly express does not allow exactly what I want here, and the best way to go about when one wants to use plotly is establishing necessary consistency across data, e.g. as described in your answer. I hope this helps others as well. – Joseph Doob Oct 29 '20 at 13:38