0

I am trying to visualize the total number of calls made during the time interval, where x is a month and y is a sum of all calls made during that period.

I have a main DataFrame df1 with various columns, where I take two columns with 'date' and 'duration' values and resample it to a monthly period:

df2 = df1[['date', 'duration']]

monthly_df2 = df2.set_index('date').resample('M').sum()

I can get a nice DataFrame with the data I want:

2018-10-31  03:03:34
2018-11-30  03:22:21
2018-12-31  04:31:56
2019-01-31  04:02:31

The problem starts when I want to plot this data:

  1. If I use plot() method directly to resampled DataFrame I can get a line graph, where y value is shown in nanoseconds, so value 03:03:34 transformed to 11014000000000 and so on.

  2. When I use .plot.bar() method I have:

TypeError: Passing integers to fillna for timedelta64[ns] dtype is no longer supported. To obtain the old behavior, pass pd.Timedelta(seconds=n) instead.

I was looking through the stack overflow and other resources, but all the solutions for bar plots were posted before the pandas v1.0 was released and I also have same TypeError if I use those solutions.

They changed it to Timedelta concept, but I cannot understand how I can use it in my situation: Timedelta

Could anyone suggest me a good way to overcome this issue

ArKey
  • 31
  • 4
  • I cannot reproduce this error. The plotting works for me. I am on 1.0.3 – mechanical_meat Apr 30 '20 at 16:35
  • I checked the ```monthly_df2.dtypes``` and get the following result: ```date datetime64[ns] ``` ```duration timedelta64[ns] ``` ```dtype: object``` Can it be the reason? – ArKey May 01 '20 at 04:42
  • I'm not sure why there's a `dtype: object` in there... that doesn't seem to lead to the error you're seeing, however. – mechanical_meat May 01 '20 at 04:59
  • @mechanical_meat so you mean I should have some ```datetime``` instead of ```object```? – ArKey May 01 '20 at 07:46
  • Right, exactly. – mechanical_meat May 01 '20 at 11:31
  • I think my main problem comes from that the ```duration``` column is a ```timedelta64[ns]``` format. It works fine if I resample my ```date``` column into some period. But the problem comes when I want to plot the result by use ```.plot()``` method. The duration transforms into nanoseconds and my ```y``` axis becomes enormously huge numbers. And if I want to plot the bar graph, I have this ```TypeError``` which I mentioned before. – ArKey May 02 '20 at 04:14
  • In case I make my ```duration``` column with ```pd.to_datetime(value, format="%H:%M:%S")``` in addition to my time in minutes and hours I got YEAR-MONTH-DAY like 1900-01-01. And then if I resample ```date``` and sum by ```duration``` I receive ```NaN```. So I am looking for the method to be able to solve this issue. – ArKey May 02 '20 at 04:17
  • Actually it works fine if my ```y``` column is some integer value and then I resample and use ```.plot.bar()``` method. By my main idea is to show the duration in time – ArKey May 02 '20 at 04:24
  • I'm glad you solved this! – mechanical_meat May 02 '20 at 17:32

1 Answers1

2

This is more correct name for my question Plot datetime.timedelta using matplotlib and python and solution which could be also found in this blogpost: solution.

For short, just change the dtype of your column to .astype('timedelta64[m]'). You can change to hour, minutes or seconds for your case just by changing the value in a square brackets. It changes the dtype of your y column to float64 and then you can easily plot the bar graph or plot with normal units and not like nanoseconds

ArKey
  • 31
  • 4