0

I have the following data:

Date
01/27/2020    55
03/03/2020    44
02/25/2020    39
03/11/2020    39
01/28/2020    39
02/05/2020    38
03/17/2020    37
03/16/2020    37
03/19/2020    37
03/14/2020    35
03/09/2020    35
03/26/2020    33
03/06/2020    33
01/29/2020    33
03/23/2020    27
03/15/2020    27
02/26/2020    27
03/27/2020    26
03/02/2020    25
02/28/2020    25
03/24/2020    24
03/04/2020    24
01/21/2020    23
03/01/2020    21
02/27/2020    21
01/22/2020    21
02/18/2020    18
01/31/2020    18
03/22/2020    18
01/26/2020    18
03/31/2020    18
02/24/2020    17
01/20/2020    16
01/23/2020    16
03/12/2020    16
03/21/2020    15
02/29/2020    14
03/28/2020    13
02/19/2020    13
03/08/2020    13
02/04/2020    13
02/12/2020    12
02/01/2020    12
02/07/2020    12
03/30/2020    12
02/20/2020    11
03/07/2020    11
03/29/2020    11
02/09/2020    11
02/06/2020    11

using groupby. On the right I have the frequency of values by date. The plot is

enter image description here

generated by

fig, ax = plt.subplots(figsize=(15,7))
df.groupby(['Date']).count()['NN'].plot(ax=ax)

I would like to have vertical straight lines in correspondence of the first highest values, i.e.

01/27/2020    55
03/03/2020    44
02/25/2020    39
03/11/2020    39
01/28/2020    39

How could I add these lines in my plot?

still_learning
  • 776
  • 9
  • 32
  • 1
    I think you're looking for the [.axvline](https://stackoverflow.com/a/24988486/12161501) method. – soyapencil Jul 20 '20 at 00:07
  • yes, I think you are right. However there is something that does not work for Date as they are strings. Lines are added at the beginning of the chart, and not in the corresponding values – still_learning Jul 20 '20 at 00:24
  • Ah right. Strings are bit more complicated. Best to convert it to `datetime` first -- see my answer below. – soyapencil Jul 20 '20 at 01:16

1 Answers1

2

The .axvline method should do the trick, regarding the vertical lines. If you try to plot a pandas DataFrame/Series using a set of strings for the index, pandas does some fancy footwork in the background.

You could mess around with the xticks and all sorts, but the easiest thing to do is to convert your column to datetime64.

First, let's make some fluff data:

import random
import pandas as pd
from string import ascii_lowercase

# Make some fluff
dates = [f'01/{random.randint(1,28)}/1901' for _ in range(100)]
fluff = [ascii_lowercase[random.randint(1,26):random.randint(1,26)]
         for _ in range(100)]

# Pack into a DataFrame
df = pd.DataFrame({'Date': dates, 'NN': fluff})

# Aggregate
counted = df.groupby('Date').count()

Taking a quick peek:

>>> counted
               NN
Date            
01/10/1901      2
01/11/1901      6
01/12/1901      2
...             ...

You can substitute this for whatever data you have. It's probably easiest if you convert your column before doing the groupby, so:

df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
agg_df = df.groupby(['Date']).count()

fig, ax = plt.subplots(figsize=(8,6))
agg_df['NN'].plot(ax=ax)

Plot of 'NN' versus 'Date'

The plot is similar to above. Note that I'm using 8 by 6 for the figsize so that the figure will fit easier on the StackOverflow page. Change it back to 15 by 7 when running your code.

I've used %m/%d/%Y format, as that appears to be what you are using. See here for more info on date formatting: official datetime doc

Finally, get the vertical lines by using a datetime directly:

import datetime
ax.axvline(datetime.datetime(1901,01,10), color='k')

enter image description here

If you want to get vertical straight lines for the highest values, sort your aggregated DataFrame, then whack it in a for-loop.

for d in agg_df.sort_values('NN',ascending=False).index[:5]:
    ax.axvline(d, color='k')
soyapencil
  • 509
  • 4
  • 9