-1

I have the following time series with ts as index and column named metric_value:

ts
2020-01-01 00:00:00    1225917
2020-01-01 01:00:00     670334
2020-01-01 02:00:00     668207
2020-01-01 03:00:00     576977
2020-01-01 04:00:00     713490
Name: metric_value, dtype: int32

I'm trying to plot this time series and mark the outlier data points with a red circle. The outliers' indexes are in this list:

idxs=['2020-06-06 19:00:00', '2020-07-04 19:00:00', '2020-08-08 19:00:00']

The following shows how I'm plotting the data for June.

fig, ax = plt.subplots(1, 1, sharex="all", sharey="all", figsize=(12,4))
ax = ts.loc['2020-06-01 00:00:00':'2020-06-30 23:00:00']['metric_value'].plot(title='June')
ax = ts.loc[[idx for idx in idxs if idx>'2020-06-01 00:00:00' and idx<'2020-06-30 23:00:00']]['metric_value'].plot(style='.')

plt.xticks(rotation=45)
plt.ylim(bottom=0)

For month June, there is one outlier at this index=2020-06-06 19:00:00. The issue is that the plot doesn't show this data point on the correct location. It shows it on the first location which is zero! I think it happens because these two plots are not sharing the x-axis and as the plot shows only the second plots' axis. How can I fix it? I tried this solution but it didn't work!

enter image description here

Birish
  • 5,514
  • 5
  • 32
  • 51
  • 1
    Cannot reproduce the problem with the data provided. You need to provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) that includes a toy dataset (refer to [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)) – Diziet Asahi Dec 02 '20 at 10:20

2 Answers2

0

Setting up the two axes in the Pandus plot can be achieved in the following ways Does the following answer solve your problem?

import pandas as pd
import numpy as np

date_rng = pd.date_range('2020-01-01', '2020-09-30', freq='1H')
value = np.random.randint(5000, 30000, size=6553)
ts = pd.DataFrame({'ts': pd.to_datetime(date_rng), 'metric_value':value})
ts.set_index('ts', inplace=True)
import matplotlib.pyplot as plt

idxs=['2020-06-06 19:00:00', '2020-07-04 19:00:00', '2020-08-08 19:00:00']

# fig, ax = plt.subplots(1, 1, sharey="all", figsize=(12,4))
ts.loc['2020-06-01 00:00:00':'2020-06-30 23:00:00',['metric_value']].plot(title='June')
ts.loc[[idx for idx in idxs if idx>'2020-06-01 00:00:00' and idx<'2020-06-30 23:00:00']]['metric_value'].plot(style='.', secondary_y=['ts', 'metric_value'])

plt.xticks(rotation=45)
plt.ylim(bottom=0)

plt.show()

enter image description here

r-beginners
  • 31,170
  • 3
  • 14
  • 32
0

I solved the issue as follow:

ts['flag'] = False
for idx in idxs:
   ts.loc[idx, 'flag'] = True
    
x = list(np.where(ts.loc['2020-06-01 00:00:00':'2020-06-30 23:00:00']['flag'])[0])
y = ts.iloc[np.where(ts.loc['2020-06-01 00:00:00':'2020-06-30 23:00:00']['flag'])]['metric_value']    
    
ts.loc['2020-06-01 00:00:00':'2020-06-30 23:00:00']['metric_value'].plot(title='June', figsize=(12,4), x_compat=True)
if len([idx for idx in idxs if idx>'2020-06-01 00:00:00' and idx<'2020-06-30 23:00:00'])>0:
    plt.plot(x, y, 'ro', markersize=4,)

plt.xticks(rotation=45)
plt.ylim(bottom=0)
Birish
  • 5,514
  • 5
  • 32
  • 51