1

I have a list of dataframe that I want to render a line chart where making subplots for each dataframe. whereas two different dataframes in the list share same column structures. I want to make subplots (line chart) with minimal code possible. To do so, I referenced this post on SO but didn't get correct subplots. Below is my current approach for one dataframe:

reproducible data:

here is the minimal list of dataframe on gist file which concatenated from list of dataframes. each of dataframe looks like this:

[![enter image description here][3]][3]

my initial approach

import matplotlib.pyplot as plt

df1=list_of_df[1]
fig, ax=plt.subplots(figsize=(14,8))
plt.plot(df1.index, df['2014'], label="2014")
plt.plot(df1.index, df['2015'], label="2015")
plt.plot(df1.index, df['2016'], label="2016")
plt.plot(df1.index,df['2017'], label="2017")
plt.plot(df1.index,df['2018'], label="2018")
plt.plot(df1.index,df['avg'],  "--", label="5-Yr-Avg")
plt.show()

my initial output for single dataframe:

here is the output of above attempt:

[![enter image description here][4]][4]

I am trying to loop through this list of dataframe to get subplots so it can be much easier to compare two subplots with different data. I couldn't able to get that. How can I make this happen? any idea? Thanks

if I used SO post solution

nrow=2
ncol=2
fig, axes = plt.subplots(nrow, ncol)
# plot counter
count=0
for r in range(nrow):
    for c in range(ncol):
        list_of_df[count].plot(ax=axes[r,c])
        count=+1

the output of this code is not correct. I am expecting two subplots for two dataframe. How to fix this? I think iterating column was wrong, that's why I got 6 subplots, I should iterate by the index of dataframe. Any idea?

my new attempt:

I am trying to reduce line of codes that implemented in my initial attempt. Since I have list of dataframe, I might code as follow:

fig, ax = plt.subplots(figsize=(10,8))
for x in range(len(df_list)):
    ax.plot(df_list[x].index, df_list[x].columns, kind='line')

plt.show()

but this gave me value error as follow:

ValueError: x and y must have same first dimension, but have shapes (12,) and (6,)

why this error raised? Is there any way to generalize my initial implementation to list of dataframe for making subplots? any idea?

Hamilton
  • 620
  • 2
  • 14
  • 32

3 Answers3

3

I figured out the solution for this, hope it will be helpful to others. Since the input is list of dataframe, it is easier to do as follow:

import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
from itertools import cycle


df1, df2 = list_of_df[0], list_of_df[1]

colors=cm.tab10(np.linspace(0, 1,len(df1.columns)))
lines = ["-","--","-.",":"]

linecycler = cycle(lines)
leg_text = df1.columns.tolist()
marker = cycle(('+', 'o', '*', 'v','^','<','>'))

fig,(ax1,ax2) = plt.subplots(nrows=2,ncols=1,sharex=True,squeeze=True,figsize=(10,8))
for i in range(df1.shape[1]):
    ax1.plot(df1.index, df1.columns[i], next(linecycler), marker = next(marker), data=df1, color=colors[i], linewidth=3)
    ax2.plot(df2.index, df2.columns[i], next(linecycler), marker = next(marker),data=df2, color=colors[i], linewidth=3)

plt.tight_layout(rect=[0, 0, 0.85, 1])
plt.gcf().autofmt_xdate()
plt.style.use('ggplot')
plt.xticks(rotation=0)
plt.show()

I get my expected output. I may need to come even more efficient code, so anyone has a better idea, please let me know. Thanks

Hamilton
  • 620
  • 2
  • 14
  • 32
2

Here's a full working example of what you want to achieve:

import pandas as pd
import matplotlib.pyplot as plt
import pandas as pd
df_1 = pd.DataFrame({'2010':[10,11,12,13],'2011':[14,18,14,15],'2012':[12,13,14,13]})
df_2 = pd.DataFrame({'2010':[10,11,12,13],'2011':[14,18,14,15],'2012':[12,13,14,13]})
df_3 = pd.DataFrame({'2010':[10,11,12,13],'2011':[14,18,14,15],'2012':[12,13,14,13]})
list_df = [df_1,df_2,df_3]
for i in range(len(list_df)):
    ax = plt.subplot(len(list_df[i]),len(list(list_df[i])),i+1)
    for j in list_df[i]:
        ax.plot(list_df[i].index,list_df[i][j])

Edit:

Given your answer which seems to fully address the issue, I would change a few lines to make it a bit more efficient. Below the code, I will upload some information regarding performance:

for i in range(df1.shape1): for i in range(len(df.columns)):

leg_text = df1.columns.tolist() #What's the use of this line?

Speed comparison betweens shape, len and some other methods:

ns = np.power(10, np.arange(5))
results = pd.DataFrame(
    columns=ns,
    index=pd.MultiIndex.from_product(
        [['len', 'len(tolist)', 'len(values)', 'shape'],
         ns]))
dfs = {(n, m): pd.DataFrame(np.zeros((n, m))) for n in ns for m in ns}

for n, m in dfs.keys():
    df = dfs[(n, m)]
    results.loc[('len',n),m] = timeit('len(df.columns)', 'from __main__ import df', number=10000)
    results.loc[('len(tolist)', n), m] = timeit('len(df.columns.tolist())', 'from __main__ import df', number=10000)
    results.loc[('len(values)', n), m] = timeit('len(df.columns.values)', 'from __main__ import df', number=10000)
    results.loc[('shape', n), m] = timeit('df.values.shape[1]', 'from __main__ import df', number=10000)
fig, axes = plt.subplots(2, 3, figsize=(9, 6), sharex=True, sharey=True)
for i, (m, col) in enumerate(results.iteritems()):
    r, c = i // 3, i % 3
    col.unstack(0).plot.bar(ax=axes[r, c], title=m)

Output: enter image description here

                         1      10     100    1000   10000
len 1               0.0038  0.0046  0.0032  0.0037  0.0035
len 10              0.0032  0.0032  0.0032  0.0034  0.0035
len 100             0.0032  0.0052  0.0052  0.0053  0.0035
len 1000            0.0037  0.0036  0.0041  0.0039  0.0043
len 10000           0.0040  0.0038  0.0045  0.0043  0.0123
len(tolist) 1       0.0051  0.0075  0.0175  0.1629  1.6579
len(tolist) 10      0.0051  0.0059  0.0175  0.1588  1.9253
len(tolist) 100     0.0049  0.0097  0.0196  0.1635  1.7422
len(tolist) 1000    0.0053  0.0065  0.0198  0.1831  1.9897
len(tolist) 10000   0.0057  0.0069  0.0218  0.1995  2.2426
len(values) 1       0.0083  0.0097  0.0073  0.0074  0.0074
len(values) 10      0.0073  0.0072  0.0073  0.0107  0.0087
len(values) 100     0.0075  0.0094  0.0109  0.0072  0.0081
len(values) 1000    0.0081  0.0082  0.0081  0.0085  0.0088
len(values) 10000   0.0087  0.0084  0.0103  0.0101  0.0327
shape   1           0.1108  0.0838  0.0789  0.0779  0.0780
shape   10          0.0764  0.0770  0.0771  0.1118  0.0806
shape   100         0.0952  0.0826  0.1013  0.0800  0.0889
shape   1000        0.0881  0.0863  0.0867  0.0938  0.1063
shape   10000       0.0905  0.0999  0.1043  0.1013  0.2384
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
-1
import pandas as pd
import matplotlib.pyplot as plt
df_1 = pd.DataFrame({'A':[15,16,17,20],'B':[21,22,23,24],'C':[25,26,27,28]})
df_2 = pd.DataFrame({'A':[15,16,17,20],'B':[21,22,23,24],'C':[25,26,27,28]})
df_3 = pd.DataFrame({'A':[15,16,17,20],'B':[21,22,23,24],'C':[25,26,27,28]})
list_df = [df_1,df_2,df_3]
for i,j in enumerate(list_df):
    plt.figure(i)
    j.plot(kind = 'line')

I find this to be quite easy way to understand when making subplots.As you can see it can be easily used to match your requirements.

Akhil Sharma
  • 133
  • 1
  • 10