0

I'm trying to create a simple line chart with the date field on the X axis and the count of rows on the Y. I'm using the following code:

import plotly.express as px

data = {'Project':  ['Project 1', 'Project 2', ' Project 3', 'Project 4', 'Project 5', 'Project 6', 'Project 7', 'Project 8', 'Project 9', 'Project 10'],
        'Date': ['10/1/2020', '10/1/2020', '11/1/2020', '12/1/2020', '12/1/2020', '12/1/2020', '2/1/2021', '2/1/2021', '3/1/2021', '4/1/2021']}

df2 = pd.DataFrame(data, columns = ['Project','Date'])

fig = px.line(df2, x= "Date", y = "Project", title='<b>Project</b>')
fig.show()

But when I do that the project names are on the X axis instead of the count of projects for each date.

Does anyone know how I can add a count of rows so it shows the number of projects for each month of Date on the line chart?

hockeyhorror
  • 57
  • 2
  • 7
  • try `df.groupby('Start_Date').size().plot()` then `plt.show()`... always provide sample data so the answers can work on the same dataset and get comparable results however the approach, you might even get a copy-paste answer if you're lucky – RichieV Sep 25 '20 at 16:28
  • So while that worked for the matlib plot, it resulted in a blank Plotly line chart. If I use something like df['New_Column'] = len(df.Project.unique()) it results in the total count (36) on each date so it's a straight line instead of counting the unique projects for each date. I'm using work data, so I'll try to mock up something to use as an example in a bit – hockeyhorror Sep 25 '20 at 17:36
  • Okay I just edited the original post using some sample data to demonstrate my problem – hockeyhorror Sep 25 '20 at 17:57

2 Answers2

2

You need to groupby and count the rows in pandas before sending to plotly. Also, your sample does not show it, but if you expect different dates within the same month and you only care for the year/month then you need to apply some rounding before grouping (or extract the year & month from the date with data['Date'].dt.year and data['Date'].dt.month, whichever you prefer).

Take this slightly different sample with 10/2 thrown in there

import plotly.express as px

data = {'Project':  ['Project 1', 'Project 2', ' Project 3', 'Project 4', 'Project 5', 'Project 6', 'Project 7', 'Project 8', 'Project 9', 'Project 10'],
        'Date': ['10/1/2020', '10/2/2020', '11/1/2020', '12/1/2020', '12/2/2020', '12/1/2020', '2/1/2021', '2/1/2021', '3/1/2021', '4/1/2021']}

df2 = pd.DataFrame(data, columns = ['Project','Date'])
df2['Date'] = pd.to_datetime(df2['Date'])

df_grouped = (
    df2.groupby(
        # normalize all dates to start of month
        df2['Date'].astype('datetime64[M]')
    )['Project'].count().rename('Count').to_frame()
)
df_grouped['Names'] = (
    df2.groupby(df2['Date'].astype('datetime64[M]')
    )['Project'].agg(',<br>    '.join)
)

print(df_grouped)

fig = px.line(
    df_grouped, y='Count', title='<b>Projects per month</b>',
    hover_data=['Names']
)
fig.write_html('fig1.html', auto_open=True)

Update: as requested, this new code shows the project names on hover.

Output

            Count                                          Names
Date
2020-10-01      2                    Project 1,<br>    Project 2
2020-11-01      1                                      Project 3
2020-12-01      3  Project 4,<br>    Project 5,<br>    Project 6
2021-02-01      2                    Project 7,<br>    Project 8
2021-03-01      1                                      Project 9
2021-04-01      1                                     Project 10

enter image description here

RichieV
  • 5,103
  • 2
  • 11
  • 24
  • Hi! I have one follow-up question. I want to add the names of the projects on hover. To do this I added the following: ```hovertext = 'Project: ' + df2['Project]'``` but when I do that the hover only shows the first project on any data point even if there are more than one for that month. Do you know how I can have multiple projects show on hover for a single month? – hockeyhorror Sep 28 '20 at 19:37
  • So when I use your new code I get the following error: ```cannot astype a datetimelike from [datetime64[ns]] to [datetime64[M]]``` Do you know why that might be? The original code worked fine when converting Date to datetime64[M]. – hockeyhorror Sep 30 '20 at 22:30
  • That's weird, since it runs fine on my lap... try upgrading pandas or change the code as in [this answer](https://stackoverflow.com/a/49869631/6692898) – RichieV Sep 30 '20 at 23:39
0

You can achieve that easily with Matplotlib's hist

Example:

from datetime import datetime
import matplotlib
import pandas as pd
from matplotlib.pyplot import hist

df = pd.DataFrame(
    {'a': [1, 2, 3, 4]}, 
    index=[datetime(2020, 9, 24), 
           datetime(2020, 9, 24), 
           datetime(2020, 9, 24), 
           datetime(2020, 9, 25)]
)

hist(df.index)

enter image description here

Not the prettiest histogram, but I'm sure you can tweak it to your needs from here ;)

Lucas Abbade
  • 757
  • 9
  • 19
  • 1
    That's a matplotlib answer to a plotly question. – vestland Sep 25 '20 at 17:24
  • @vestland I believe the question did not mention plotly at all in previous edits – RichieV Sep 25 '20 at 18:32
  • @RichieV There are tags. And `px.line` means `plotly.express.line`. – vestland Sep 25 '20 at 18:37
  • @vestland he had not mentioned Plotly at first. I believe RichieV answer is what OP is looking for anyway. I'll just leave this here as an alternative – Lucas Abbade Sep 25 '20 at 20:24
  • 1
    @LucasAbbade Sure thing! There's no rule against that. I believe that the initial question was tagged *only* python and plotly though. But it's always nice to see how things can be done in matplotlib as well. As long as there's a plotly answer for a plotly tagged question too. And it often happens that I only see matplotlib answers to plotly questions, perhaps partly because many users often tend to only take a closer look at questions with *zero* answers. So I find that posting answers under the "wrong" tag can sometimes prevent a "real" answer to come along. But enough about that for now... – vestland Sep 25 '20 at 21:21