0

I'm looking for a more pythonic way of splitting a very large plot into several subplots, separated by month (february, march, etc.)

I have converted all of the date values in the df to DateTime using

pd.to_datetime(df['dates']

I was then successful in creating new variables containing slices of my dataframe based on the desired date ranges, but this doesn't seem like the most efficient/reproducible method. My initial thought process was to set a limitation on the x-axis using datetime() and passing two arguments for the date ranges I needed. Still not extremely efficient, but my initial dataset only has five months.

plt.figure(1)
plt.subplot(511)
plt.plot(x['dates'], y, marker='o')
plt.xticks(rotation='vertical')
plt.rcParams['figure.figsize'] = (30,10)
plt.xlabel('time')
plt.ylabel('day-over-day change')
plt.xlim([datetime.date.strftime(2019, 2, 1), 
datetime.date.strftime(2019, 2, 28)])
plt.show()

I'm expecting a small subplot containing all data points that fall between 2/1/2019 and 2/28/2019 but when I run this code I get a type error that reads as:

TypeError: descriptor 'strftime' requires a 'datetime.date' object but 
received a 'int'

EDIT: I've also tried

plt.xlim([datetime.date(2019, 2, 1), datetime.date(2019, 2, 26)])

but that generates the error:

TypeError: datetime.date(2019, 2, 1) is not a string

That is why I'm attempting to use 'strftime'

END EDIT

while creating the correct number of subplots automatically would be ideal, for now I'm just interested in passing the right arguments through matplotlib.pyplot() so I can make the the data more digestible for my customer. If anyone wants to tackle the process of iterating through the df with the goal of automating the determination for the number of plots (and their proper segmentation), I would not object.

Nick Bohl
  • 105
  • 3
  • 13
  • `strftime(date)` converts a date `date` to a string representation of that date. So (a) `strftime(year, month, day)` is not the correct usage of that function, and (b) This is not what you want here anyways. You want to limit by actual dates, possibly something like `datetime.date(2019, 2, 1)` – ImportanceOfBeingErnest Jun 10 '19 at 16:07
  • thank you for your comment. I tried that initially, but I got "TypeError: datetime.date(2019, 2, 1) is not a string". That is why I was attempting strftime. Not sure how to combat it otherwise. – Nick Bohl Jun 10 '19 at 16:26
  • In that case, unlike you claim, you haven't converted the date strings in the dataframe to dates. – ImportanceOfBeingErnest Jun 10 '19 at 16:30
  • so the command **pd.to_datetime(x['dates'])** is not sufficient for what I'm trying to accomplish? – Nick Bohl Jun 10 '19 at 16:37
  • It would be, but apparently you haven't used it. Always provide [mcve]s. – ImportanceOfBeingErnest Jun 10 '19 at 17:02
  • I apologize, I had an embarrassing typo. that worked once I corrected it, thank you. – Nick Bohl Jun 10 '19 at 17:15

1 Answers1

1

Your current usage of strftime is using an incorrect input. strftime() takes a date object and converts it into a string. Instead, try datetime.date(Y, M, D).

Also, you can use some builtin features of pandas to index and separate time arrays into distinct time regions. Specifically, the Grouper() functionality allows for grouping datetime columns by common time attributes such as Month, Week, Year. here is some sample code that generates a dataframe, and then splits the dataframe into separate dataframes for each month:

import datetime
import pandas as pd

dates = []
values = []

for i in range(1,12):
    for j in range(1, 10):
        dates.append(datetime.date(2019, i, j))
        values.append(i*j)

pd_time = pd.to_datetime(dates)

data = {"timestamp": pd_time, "values": values}
df = pd.DataFrame(data)
months = [g for n, g in df.set_index('timestamp').groupby(pd.Grouper(freq='M'))]

looking at the result months shows:

>>> months
[            values
timestamp
2019-01-01       1
2019-01-02       2
2019-01-03       3
2019-01-04       4
2019-01-05       5
2019-01-06       6
2019-01-07       7
2019-01-08       8
2019-01-09       9,             values
timestamp
2019-02-01       2
2019-02-02       4
2019-02-03       6
2019-02-04       8
2019-02-05      10
2019-02-06      12
2019-02-07      14
2019-02-08      16
2019-02-09      18,             values
...

See this SO thread for info on splitting time series by date range.

  • Thank you for this! For clarification, if I wanted these to plot themselves, regardless of the number of months in the dataset, would I simply embed code similar to this in a for-loop and iterate through the range of 'groups'? – Nick Bohl Jun 10 '19 at 17:14
  • Yes, the 'months' output is a list of pandas dataframes, so you could iterate over the list and create a plot for each item. – Luc Lapenta Jun 10 '19 at 18:02
  • great. I'll try to figure that on my own over time, but thank you again for your assistance. for now I'm just gonna work on getting these to plot the way I want – Nick Bohl Jun 10 '19 at 19:16