I am working with 3 pandas dataframes
having the same column structures(number and type), only that the datasets are for different years.
I would like to plot the ECDF
for each of the dataframes, but everytime I do this, I do it individually (lack python skills). So also, one of the figures (2018) is scaled differently on x-axis
making it a bit difficult to compare. Here's how I do it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from empiricaldist import Cdf
df1 = pd.read_csv('2016.csv')
df2 = pd.read_csv('2017.csv')
df3 = pd.read_csv('2018.csv')
#some info about the dfs
df1.columns.values
array(['id', 'trip_id', 'distance', 'duration', 'speed', 'foot', 'bike',
'car', 'bus', 'metro', 'mode'], dtype=object)
modal_class = df1['mode']
print(modal_class[:5])
0 bus
1 metro
2 bike
3 foot
4 car
def decorate_ecdf(title, x, y):
plt.xlabel(x)
plt.ylabel(y)
plt.title(title)
#plotting the ecdf for 2016 dataset
for name, group in df1.groupby('mode'):
Cdf.from_seq(group.speed).plot()
title, x, y = 'Speed distribution by travel mode (April 2016)','speed (m/s)', 'ECDF'
decorate_ecdf(title,x,y)
#plotting the ecdf for 2017 dataset
for name, group in df2.groupby('mode'):
Cdf.from_seq(group.speed).plot()
title, x, y = 'Speed distribution by travel mode (April 2017)','speed (m/s)', 'ECDF'
decorate_ecdf(title,x,y)
#plotting the ecdf for 2018 dataset
for name, group in df3.groupby('mode'):
Cdf.from_seq(group.speed).plot()
title, x, y = 'Speed distribution by travel mode (April 2018)','speed (m/s)', 'ECDF'
decorate_ecdf(title,x,y)
Output:
I am pretty sure this isn't the pythonist way of doing it, but a dirty way to get the work done. You can also see how the 2018 plot is scaled differently on the x-axis.
- Is there a way to enforce that all figures are scaled the same way?
- How do I re-write my code such that the figures are plotted by calling a function once?