How to plot line on different columns according to categorical variable

Question

I have a dataframe containing a time series with two columns as follows:

            dailyp  kind
date        
2015-01-01  165.0   national
2015-01-02  210.0   not_festive
2015-01-03  222.0   not_festive
2015-01-04  190.0   not_festive
2015-01-05  200.0   not_festive
...     ...     ...
2019-12-28  260.0   not_festive
2019-12-29  226.0   not_festive
2019-12-30  216.0   not_festive
2019-12-31  189.0   not_festive
2020-01-01  237.0   not_festive

I have written a function to plot the time series differing on the value of kind that goes as follows:

def plot_timeseries_by_category(df, category_col):
  # Get the unique years from the index
    years = df.index.year.unique()
    
    # Create a subplot for each year
    fig, axes = plt.subplots(len(years), 1, figsize=(10, len(years) * 5))
    if len(years) == 1:
        axes = [axes]
        
    for year, ax in zip(years, axes):
        # Filter the data for the current year
        df_year = df[df.index.year == year]
        
        # Create all neccesary colors
        colors = {category: f'C{index}' for index, category in enumerate(df[category_col].unique())}
        print(df_year.index)
        # Groupby category and plot
        for category, group in df_year.groupby(category_col):
            
            group.plot('index', 'dailyp', marker='o', ax=ax, color=colors[category], label=category)
        
        ax.set_title(str(year))
        ax.legend()

The code also breaks the time series by years, but that works just fine. But instead of plotting a single line with different colors depending on the category, it plots a line for each category. I want to achieve what is shown in the approved answer of this post Plot Multicolored line based on conditional in python, but couldn't make it work.

Any help is appreciated!

What exactly do you want? You might find it helpful to use a pivot and second plot. Maybe a new feature will help. — Tornike Kharitonishvili, Feb 12 '23 at 06:17

Serge de Gosson de Varennes · Answer 1 · 2023-02-12T07:03:27.700

You can do this

import pandas as pd
import matplotlib.pyplot as plt

data = {
    'date': ['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', 
             '2019-12-28', '2019-12-29', '2019-12-30', '2019-12-31', '2020-01-01'], 
    'dailyp': [165.0, 210.0, 222.0, 190.0, 200.0, 
                260.0, 226.0, 216.0, 189.0, 237.0], 
    'kind': ['national', 'not_festive', 'not_festive', 'not_festive', 'not_festive', 
             'not_festive', 'not_festive', 'not_festive', 'not_festive', 'not_festive']
}

df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')


def plot_timeseries_by_category(df, category_col):
    # Get the unique years from the index
    years = df.index.year.unique()

    # Create a subplot for each year
    fig, axes = plt.subplots(len(years), 1, figsize=(10, len(years) * 5))
    if len(years) == 1:
        axes = [axes]

    for year, ax in zip(years, axes):
        # Filter the data for the current year
        df_year = df[df.index.year == year].reset_index().reset_index()

        # Create all neccesary colors
        colors = {category: f'C{index}' for index, category in enumerate(df[category_col].unique())}

        # Groupby category and plot
        for category, group in df_year.groupby(category_col):
            group = group.set_index('index')
            ax.plot(group.index, group['dailyp'], marker='o', color=colors[category], label=category)

        ax.set_title(str(year))
        ax.legend()

plot_timeseries_by_category(df, 'kind')
plt.show()

which will return a plot for each year

How to plot line on different columns according to categorical variable

1 Answers1