2

I have a scenario of the index being datetime objects and the data I want to plot are sales counts. Most of the time, there are multiple sales done throughout the day and each day can have different amount of sales. I would like to create a plot that shows a date range that nicely formats the xticklabels, depending on how many days I'd like to show in the plot. Kind of like this. I've tried different variants of code but have thus far been unsuccessful. Could someone look at my script below and please help me?

import pandas as pd
import matplotlib.pyplot as plt

index1 = ['2017-07-01','2017-07-01','2017-07-02','2017-07-02','2017-07-03','2017-07-03','2017-07-03']
index2 = pd.to_datetime(index1,format='%Y-%m-%d')

df = pd.DataFrame([[123456],[123789],[123654],[654321],[654987],[789456],789123]],columns=['Count'],index=index1)

df.plot(kind='box')
plt.show()
Daniel
  • 77
  • 3
  • 20

1 Answers1

2

Use T, transpose and reshape your dataframe.

df.T.plot(kind='box', figsize=(10,7))

Output:

enter image description here

Okay to keep those dates as separate records and boxplot. Let's do this:

df.reset_index().set_index('index',append=True).unstack()['Count'].plot(kind='box',figsize=(10,7))

This is better.

df.set_index(np.arange(len(df)),append=True).unstack(0)['Count']\
  .plot(kind='box',figsize=(10,7))

Output:

enter image description here

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Is there a way for the xticklabels to be formatted like matplotlib does? – Daniel Sep 07 '17 at 12:23
  • Yes, you can format your xticklabels. See this [SO post](https://stackoverflow.com/questions/12945971/pandas-timeseries-plot-setting-x-axis-major-and-minor-ticks-and-labels) – Scott Boston Sep 07 '17 at 12:51
  • That's awesome except how do you solve for the duplicate days? The boxplot is supposed to depict the min,median,and max for each day. – Daniel Sep 08 '17 at 14:45
  • You can use the drop_duplicates method of dataframe and give it a subset. – Scott Boston Sep 08 '17 at 15:12
  • How can you do that if the columns are labeled with the same date? I dont think pandas will grab all the columns like df2=df[[´2017-07-01´]] – Daniel Sep 08 '17 at 17:06
  • `df.groupby(level=0).sum().T.plot(kind='box',figsize=(10,7))` – Scott Boston Sep 08 '17 at 17:09
  • I can check this in an hour. Its not going to just give me one record that sums all variables for each date is it? I'm giving you the credit for answering. I really appreciate you helping me learn this. – Daniel Sep 08 '17 at 17:11
  • `df.reset_index().set_index('index',append=True).unstack()['Count'].plot(kind='box',figsize=(10,7))` May you want to keep those as separate records. – Scott Boston Sep 08 '17 at 17:13