Matplotlib: y-axis normalised

Question

I have the following dataset

Date              Type        Label
2020-03-20         A            1
2020-03-20         A            0
2020-03-19         B            1
2020-03-17         A            1
2020-03-15         C            0
2020-03-19         A            0
2020-03-20         D            1
2020-03-20         A            1

that I would like to plot with normalised values in a multiple lines plot. The code below plots the different lines through time

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, figsize=[10,6])

(df.loc[df.Label.eq(1),].groupby(["Date","Type"]).agg({"Type":"count"})
 .unstack(1).droplevel(0,axis=1)
 .fillna(method="ffill")
 .plot(ax=ax, kind="line")
)

but when I try to apply normalisation

column_norm=['Type']
df[column_norm] = df[column_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

it fails, returning an error:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

when I calculate min and max.

Can you please tell me how to get a plot with y axis normalised to 1?

I get the error: TypeError: unsupported operand type(s) for -: 'str' and 'str' when I calculate min and max — LdM, Feb 25 '21 at 00:25
`df['Type']` is a string column. What is your expected outcome of a string subtracted from a string? — G. Anderson, Feb 25 '21 at 00:27
@G.Anderson, I believe his `groupby()` above should produce integer counts. — NotAName, Feb 25 '21 at 00:29
But anyway, just try to explicitly converting whatever column to numerical for example like `df = df.astype({'column_norm': int})` and see how that goes. — NotAName, Feb 25 '21 at 00:30
I have tried and I got this new one. KeyError: 'Only a column name can be used for the key in a dtype mappings argument.' — LdM, Feb 25 '21 at 00:38
It would help to see where in your code you're trying to apply the normalization — G. Anderson, Feb 25 '21 at 00:52
@pavel after the groupby, `'Type'` is no longer in the column labels, which would throw a `KeyError` instead — G. Anderson, Feb 25 '21 at 00:54

score 2 · Accepted Answer · answered Feb 26 '21 at 20:37

Based on the small sample of data and the way that you are using count and fillna in the code you have shared, I figure that you are wanting to compute the normalized/rescaled cumulative sum of the count labels equal to one through time. Here is a step-by-step example of how to do this using a larger sample dataset:

import numpy as np   # v 1.19.2
import pandas as pd  # v 1.1.3

# Create sample dataset
rng = np.random.default_rng(seed=1)  # random number generator
dti = pd.date_range('2020-01-01', '2020-01-31', freq='D')
size = 2*dti.size
dfraw = pd.DataFrame(data=dict(Type = rng.choice(list('ABCD'), size=size),
                               Label = rng.choice([0,1], size=size),
                               Date = rng.choice(dti, size=size)))
dfraw.head()

You can simplify the reshaping of the dataframe by using the pivot_table method. Notice how the df.Label.eq(1) mask and the aggregation function count are replaced here by aggfunc='sum' which takes advantage of the fact that Label is numeric:

dfp = dfraw.pivot_table(values='Label', index='Date', columns='Type', aggfunc='sum')
dfp.head()

Then the normalized/rescaled cumulative sum can be computed for each variable using the apply method:

dfcs = dfp.apply(lambda x: x.cumsum()/x.sum(), axis=0)
dfcs.head()

Finally, the NaN values can be filled to make the lines in the plot continuous:

df = dfcs.fillna(method='ffill').fillna(value=0)
df.head()

ax = df.plot(figsize=(10,6))

# Format the tick labels using the default tick locations and format legend
ticks = ax.get_xticks()
ticklabels = pd.to_datetime(ticks, unit='D').strftime('%d-%b')
ax.set_xticks(ticks)
ax.set_xticklabels(ticklabels, rotation=0, ha='center')
ax.legend(title='Type', frameon=False);

The last part from "ax = df.plot(figsize=(10,6))" and down does not print a plot...What should I do? — just_learning, Mar 09 '21 at 20:00
@just_learning I ran this code in Jupyter Notebook which displays plots automatically thanks to [IPython](https://ipython.readthedocs.io/en/stable/interactive/plotting.html#rich-outputs) (v7.21.0 with default settings). If you are not using IPython try adding `ax.figure.show()` at the end of the code or `plt.show()` as shown [here](https://stackoverflow.com/questions/8575062/how-to-show-matplotlib-plots-in-python). — Patrick FitzGerald, Mar 10 '21 at 09:42
Neither "ax.figure.show()" nor "plt.show()" works! I use Ubuntu 20.04.1 LTS — just_learning, Mar 10 '21 at 11:41
@just_learning I am using Windows 10 and I can print the plot in regular Python just the same as with IPython so I am not able to be of further help on this issue as I cannot reproduce the problem. Maybe one of the answers [here](https://stackoverflow.com/questions/7534453/matplotlib-does-not-show-my-drawings-although-i-call-pyplot-show) (or among the linked questions) may help. — Patrick FitzGerald, Mar 10 '21 at 12:45

Matplotlib: y-axis normalised

1 Answers1